252grass3-051 3/24/05 (Open this document in 'Page Layout' view!) Name: Class days and time: Please include this on what you hand in! Graded Assignment 3 In your outline there are 6 methods to compare means or medians, methods D1, D2, D3, D4, D5a and D5b. Method D6 compares proportions and method D7 compares variances or standard deviations. In all the following cases, identify H 0 and H 1 and identify and D 1 2 . If the p and p p1 p 2 . If the hypotheses involve standard deviations which method to use. If the hypotheses involve a mean, state the hypotheses in terms of both hypotheses involve a proportion, state them in terms of both or variances, state them in terms of both 2 and 12 22 or 22 12 . All the questions involve means, medians, proportions or variances. (Most problems are highly edited versions of problems in McClave, et. al.) Note: Look at 252thngs (252thngs) in the syllabus supplement before you start (and before you take exams). ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 1. We have the amount spent by a sample of 25 pharmaceutical firms on research in 2002 and again for the same firms in 2004. We assume that the underlying distributions are Normal. Was the average amount spent in 2004 above the average amount spent in 2002? Solution: You are comparing means before and after the passing of 2 years. You can get away with using means because the parent distributions are Normal. If 2 is the mean of the 2004 sample, you are testing that 2 1 , which, because it contains no equality is an alternate hypothesis. So your hypotheses are H 0 : 1 2 H : 2 0 H : D 0 or 0 1 . If D 1 2 , then 0 . The important thing to notice H : H : 0 2 2 1 1 1 1 H 1 : D 0 here is that the data are in before and after pairs, so you use Method D4. Of course, if you took 2 as the H : 2 mean of the 2002 sample, you have 0 1 etc. Also, you might have found it more natural to H 1 : 1 2 H 0 : D 0 define D as 2 1 . In these cases we have H 1 : D 0 2. Assume that you had the same data and a similar task to Problem 2 but your preliminary analysis indicated that the underlying distributions were highly skewed to the right. Solution: The presence of skewness, especially with small samples, means that you should be using a nonparametric (rank) method. There are two reasons for this. First, for small samples the mean is not reliably Normally distributed, and, second, because even for a large sample, the median of a skewed H 0 : 1 2 distribution is more ‘typical’ than the mean. If is the median. . Since we are comparing H 1 : 1 2 medians and the data are paired, use Method D5b. 3. You have the grades of a sample of 15 traditional university students and 2 grades of a sample of 13 non-traditional university students, do these differ on average? Assume that preliminary analysis indicates that the variances of grades of the two groups are similar and presume an underlying Normal distribution. H 0 : 1 2 H 0 : 1 2 0 H 0 : D 0 Solution: or . If D 1 2 , then . Because you believe H : H : 0 2 2 1 1 1 1 H 1 : D 0 that the Normal distribution applies, you use a method that compares means. The total sample size is too small to use Method D1, which means that D2 or D3 should work. You have tested the variances and not disproved equality so use D2. 252grass3-051 3/24/05 (Open this document in 'Page Layout' view!) 4. Since the results of Problem 3 were inconclusive, you take new samples of 250 traditional university students and 150 non-traditional students, again you assume that the underlying distribution is Normal, but you do not bother to compare variances. You are still trying to find out if grades of the two groups differ. H : 2 H : 2 0 H : D 0 Solution: 0 1 or 0 1 . If D 1 2 , then 0 . Because you believe H 1 : 1 2 H 1 : 1 2 0 H 1 : D 0 that the Normal distribution applies, you use a method that compares means. The total sample size is large, so this is the place where most people would use Method D1. But, if a computer is handy, there is no reason not to use D3, nor would you expect different results from D1. 5. A recheck of the data in Problem 4 indicates that the underlying distribution is far from Normal, but you are still trying to find out if the grades of the two groups differ. Solution: For the second reason given in Problem 2, caution should rule out method D1. If is the H 0 : 1 2 median. . Since we are comparing medians and the data are not paired, use Method D5a. H 1 : 1 2 6. If 10 out of a group of 80 randomly chosen people who have received information on your product are inclined to buy it and 12 out of a group of 81 randomly chosen people are inclined to buy your product after receiving the same information and seeing your commercial, does the commercial increase the proportion who will buy your product? p x1 1 n1 Solution: If group 2 are those who saw the commercial, . We have asked if the commercial p2 x2 n2 increased the proportion p1 p 2 , which must be an alternate hypothesis because it does not contain an H 0 : p1 p 2 H 0 : p1 p 2 0 H 0 : p 0 equality. or . If p p1 p 2 , then . Since we are H 1 : p1 p 2 H 1 : p1 p 2 0 H 1 : p 0 comparing proportions, use Method D6. Note that some people may find it more natural to define p as H 0 : p 0 . p 2 p1 , so that we have H 1 : p 0 7. Have new procedures decreased the variability of delivery times? Two samples have been taken and you know x1 and s1 taken before the new procedures were instituted, and x 2 and s 2 , taken afterwards. There seems to be little difference between average delivery times before and after. H 0 : 12 22 Solution: If 1 is the standard deviation after the new procedures are instituted, or H 1 : 12 22 H 0 : 2 2 2 1 2 . In terms of the variance ratio 12 or 22 , the alternate hypothesis rules, so H 0 : 22 1 1 2 1 H 1 : 1 2 and H 1 : 22 12 1 . Since you are comparing variances, use Method D7. 2 252grass3-051 3/24/05 (Open this document in 'Page Layout' view!) 8. A paper company wishes to know whether a new procedure has decreased the amount of time it takes to unload trucks. What are the hypotheses this implies? Two samples are taken with the results below. How do we decide whether to use Method D2 or D3? What hypotheses, etc do we test to make this decision? Old Method New Method n1 50 n 2 50 x1 25 .4 minutes x 2 27 .3 minutes s 2 3.7 minutes s1 3.1 minutes Solution: We have been asked to decide between Methods D2 and D3. The choice depends on the equality H 0 : 12 22 of variances. If 1 is the standard deviation after the new procedure is instituted, or H 1 : 12 22 H 0 : 12 22 1 2 . In terms of the variance ratio or , both are equivalent in a 2-sided test, so 22 12 H 1 : 1 2 12 22 H 0 : 2 1 H 0 : 2 1 2 1 s2 DF ,DF or are both fine. Supposedly, you would test both 12 against F 1 2 and 2 2 2 s2 1 2 H : 1 H : 1 1 1 22 12 s 22 s12 DF ,DF against F 2 1 , but, actually only one of these will be above 1 and actually need testing. Since you 2 are comparing variances, use Method D7. 9. You have the following data and want to see if means are similar before and after. Assume that the parent distributions are Normal. Person Before After Difference Squared 1 82 92 -10 100 2 60 72 -12 144 3 55 57 -2 4 4 97 104 -7 49 5 79 89 -10 100 Sum -41 379 Solution: You are comparing means before and after. If 2 is the mean of the second sample, you are testing that 2 1 , which, because it contains an equality is null alternate hypothesis. So your hypotheses H 0 : 1 2 H 0 : 1 2 0 H 0 : D 0 are or . If D 1 2 , then . The important thing to H : H : 0 2 2 1 1 1 1 H 1 : D 0 notice here is that the data are in before and after pairs, so you use Method D4. I have added the marerial in red on the right to remind you that x1 x 2 d sd 1 n d 2 n d n 1 2 1 5 d 41 8.20 n 5 and that 379 58.20 2 4 3 252grass3-051 3/24/05 (Open this document in 'Page Layout' view!) 10. An experiment at Duke University was conducted to see if ability to identify food by taste and smell decreases by age. One food (mushed apple) was correctly identified by 81out of 100 students and by 51 out of 100 older people. p x1 1 n1 H 0 : p1 p 2 H : p p 2 0 H : p 0 Solution: If or 0 1 . If p p1 p 2 , then 0 . H 1 : p1 p 2 H 1 : p1 p 2 0 H 1 : p 0 p2 x2 n2 Since we are comparing proportions, use Method D6. Important note: Two problems from last year’s exam should serve as a warning, especially on the final exam. 11. You have interviewed a sample of 80 small businesses in the Northeast and 75 small businesses in the Southeast. Each business has indicated whether they sell in foreign markets. You want to show that businesses in the Northeast are more likely to export. ( x1 is the total number of firms that export in the Northeast sample, x 2 in the Southeast). p x1 1 n1 H 0 : p1 p 2 H : p p 2 0 H : p 0 Solution: If or 0 1 . If p p1 p 2 , then 0 . H 1 : p1 p 2 H 1 : p1 p 2 0 H 1 : p 0 p2 x2 n2 Since we are comparing proportions, use Method D6. 12. You expand the sample in 11 by adding 60 small businesses in the Midwest, ( x3 is the number of these that export). You test the hypothesis that the same fraction of businesses export in each region. p x1 1 n1 H 0 : p1 p 2 p 3 H 0 : p1 p 2 0 x Solution: If p 2 2 n or . This is a chi-squared test of 2 H 1 : not all ps equal. H 1 : p1 p 2 0 p x3 3 n3 homogeneity. Since we are comparing multiple proportions, use a chi-squared test. O.K. How would you set it up? 4 252grass3-051 3/24/05 (Open this document in 'Page Layout' view!) Extra Credit: In Method D6, we assume that we are comparing proportions from two independent samples. In the McNemar Test we compare two proportions taken from the same sample. Assume that two question 2 question 1 yes no different questions are asked of the same group with the following responses. x yes 11 x12 x no 21 x 22 So, for example x 21 is the number of people who answered no to question 1 and yes to question 1. x11 x12 x 21 x 22 n , p1 z x12 x 21 x12 x 21 H : p p 2 x11 x12 x x 21 and p 2 11 . If we wish to test 0 1 , let n n H 1 : p1 p 2 (The test is valid only if x12 x 21 10 .) A famous example of this concerns a debate between candidates, question 1 is whether the respondent supports candidate 1 before the debate and question 2 is whether the respondent supports candidate 1 after question 2 question 1 yes no the debate. The data is and the question is whether the debate has changed the yes 27 7 13 28 no fraction supporting candidate 1. Write this out as a hypothesis test and do the test. H : p p 2 H : p p 2 0 Solution: 0 1 or 0 1 This is a two-sided test, so if we use a 5% significance H 1 : p1 p 2 H 1 : p1 p 2 0 x x 21 level, our rejection region are below z .025 1.96 and above z.025 1.96 . z 12 x12 x 21 7 13 6 36 1.8 1.34 , and we cannot reject the null hypothesis. If we use a p-value, 20 7 13 20 2Pz 1.34 2.5 .4099 0.0901 , so we could reject the null hypothesis at a 10% significance level, H 0 : p1 p 2 but not a 5% level. If you (wrongly, but understandably though that the hypotheses were or H 1 : p1 p 2 H 0 : p1 p 2 0 , the 5% rejection region would be below z .05 1.645 and we still could not reject the H 1 : p1 p 2 0 null hypothesis. Note: This is a version of the Chi-Square Test – Recall that 2 O E 2 E . If we take x11 and x 22 as question 1 given, and assume that the null hypothesis is correct, then the table already given, yes no question 2 yes no x11 x12 x 21 x 22 is our O , and the numbers in the x12 and the x 21 slots must be equal for there to be no change in 5 252grass3-051 3/24/05 (Open this document in 'Page Layout' view!) question 1 preferences, so that out E is yes no question 2 yes no x12 x 21 . This means that two of the four terms in x11 2 x x 21 12 x 22 2 2 2 O E 2 E x12 x 21 2 x12 x 21 x x 21 x x 21 x12 12 x 21 12 2 2 2 are zero and the remaining terms are x12 x 21 x12 x 21 2 2 2 . But 2 has only one degree of freedom, and, since 2 is defined as a sum of z 2 , we can take a square root and say z x12 x 21 x12 x 21 . 6