Defects – Is the day of the week important? Question #6 TGIF Intent of the Question The primary goals of this question were to assess a student's ability to (1) determine whether the conditions for applying a particular inference procedure are satisfied; (2) calculate the p-value for a binomial test and draw an appropriate conclusion; and (3) calculate a test statistic based on signed rank data; and (4) use simulation results to draw an appropriate conclusion. Scoring This question is scored in four sections. Section 1 consists of parts (a) and (b), section 2 consists of part (c), section 3 consists of part (d), and section 4 consists of part (e). Each of the four sections is scored as essentially correct (E), partially correct (P), or incorrect (I). Question #6 In a large manufacturing company every item produced is inspected for defects and will go through a repair process if there are serious defects. Management wanted to investigate whether items produced on Mondays are more likely to require repairing than items produced on the midweek day Wednesday. A random sample of 9 weeks from the past 5 years was taken and the number of items which required repairing for the 9 weeks are shown in the table below. Week Monday Wednesday Difference More Repairing on Monday Signed Rank of |Difference| A 17 18 −1 NO −1 B 14 17 −3 NO −2 C 19 12 7 YES 6 D 21 17 4 YES 3 E 18 13 5 YES F 19 10 9 YES G 24 4 20 YES H 25 17 8 YES I 22 16 6 YES Part (a) – Section 1 A boxplot of the differences in number of items which required repairing on Monday and Wednesday in this sample of 9 weeks is shown below. Explain why management determined that the matched pair t-test of H0: μ difference = 0 Ha: μ difference > 0 (where μ difference is the mean of the differences in the number of produced items which required repairing on Monday and on Wednesday for all weeks in the past 5 years) was not appropriate after seeing the boxplot of the differences. Part(b) –Section 1 A different possible set of hypotheses for this investigation could be H0: p = 0.5 Ha: p > 0.5 where p is the proportion of weeks where Monday had more produced items which required repairing than Wednesday. Explain why the one-sample proportion z-test would not be appropriate for these data. Section 1 Solutions Solution Part (a) Solution Part (b) One of the conditions for the matched pair t-test is a check for normality which for a small data set (n < 30 or n < 25) is usually a graphical check for the absence of extreme skewness and outliers. The boxplot identifies the difference value of 20 as an outlier, and hence, it may not be reasonable to believe that the differences are from a population which has an approximately normal distribution. The normal approximation to the binomial distribution in the one sample proportion z-test is generally used when np ≥ 10 and n(1 – p) ≥ 10 (or ≥ 5) where p is the hypothesized value in the null hypothesis. For these data np = 9(.5) = 4.5 is too small. Scoring Section 1 Essentially correct (E) if the response includes the following two components: 1) Concludes that it is not reasonable to assume the differences are from a population that is normally distributed because of the outlier. 2) Calculates the value of np in part (b) AND concludes that np is too small to use the normal approximation to the binomial distribution. Partially correct (P) if the response includes only one of the two components listed above. Incorrect (I) if the response does not meet the criteria for E or P. Part (c) Section 2 A sign test of the hypotheses H0: p = 0.5 Ha: p > 0.5 can be used when the onesample proportion z-test is not appropriate. The test statistic for the sign test is X = the number of weeks of the 9 sampled weeks where more items required repairing on Monday than Wednesday. Assuming that the null hypothesis is true (that Mondays and Wednesdays are equally likely to have the most produced items which require repairing), calculate the p-value P(X ≥ 7) and use this p-value to provide the conclusion of the sign test for a significance level of 𝛼 = 0.05. Part (c) – Section 2 Solution Assuming that the null hypothesis is true, X will follow a binomial distribution with n = 9 and p = 0.5, with P(X ≥ 7) = 1 – P(X ≤ 6) = 0.08984375 (from the calculator). Since the p-value of approximately 0.09 is larger than 𝛼 = 0.05, the null hypothesis is not rejected. Thus, from the sign test there is not sufficient evidence to conclude that the proportion of weeks where more items which require repairing is greater on Monday than on Wednesday. Section 2 is scored as follows: Essentially correct (E) if the response includes the following two components: 1)The p-value is correctly calculated from the binomial distribution. 2) A correct conclusion is provided in context, with justification based on linkage between the p-value and the given 𝛼 = 0.05. Partially correct (P) if the response includes only one of the two components listed above. Incorrect (I) if the response does not meet the criteria for E or P. Notes: The conclusion must be related to the alternative hypothesis. If the p-value is incorrect, section 2 is scored as P if the response includes proper linkage and a conclusion in context consistent with that p-value. If the p-value is correct and greater than 0.05, wording that states or implies that the null hypothesis is accepted lowers the score from E to P in section 2. Part (d) Section 3 A signed rank test uses both the ranks of the absolute value of the differences and the signs of the differences to test the hypotheses H0: The distributions of the numbers of items which require repairing for Mondays and Wednesday are the same. Ha: The distribution of the numbers of items which require repairing for Mondays is shifted to the right of the distribution of the number of items which require repairing on Wednesdays. The test statistic for the signed rank test is the sum of the positive ranks. d) Calculate the test statistic for the signed rank test by completing the signed rank of difference column in the table at the beginning of the problem and then adding up the positive ranks. Section 3 Part (d) Solution Part (d) Scoring Week Monday Wednesday Difference More Repairing on Monday A B C D E F G H I 17 14 19 21 18 19 24 25 22 18 17 12 17 13 10 4 17 16 −1 −3 7 4 5 9 20 8 6 NO NO YES YES YES YES YES YES YES Signed Rank of |Difference| −1 −2 6 3 4 8 9 7 5 Thus, the sum of the positive ranks is 6 + 3 + 4 + 8 + 9 + 7 + 5 = 42. Essentially correct (E) if the response includes the following two components: 1) The signed ranks of the differences are correct. 2) The signed rank test statistic is correctly calculated from the given signed ranks. Partially correct (P) if the response includes only one of the two components listed above. Incorrect (I) if the response does not meet the criteria for E or P. Part (e) Section 4 Under the assumption that the null hypothesis of the distributions of the numbers of items requiring repairing for Mondays and Wednesday are the same, 1000 simulations were performed and the signed rank test statistic was calculated for each simulation. The frequency table below provide the frequencies for these 1000 simulated signed rank test statistics. Sign Rank Statistic Values 0 1 2 3 4 5 ⋯ 40 41 42 43 44 45 Frequency 2 3 4 5 8 15 ⋯ 17 10 6 4 3 1 Part (e) – Section 4 Part (e) Based on the value of the signed rank test statistic calculated in part (d) and the distribution of the 1000 simulated signed rank test statistics above, what should be the conclusion for the manufacturing company for comparing the number of items requiring repairing on Mondays and Wednesdays? Part (e) solution The proportion of the 1,000 simulations in which the signed rank test statistic was at least as extreme as the observed test statistic value of 42 is equal to (6 + 4 + 3 + 1)/1,000 = 0.014. Because the simulated pvalue of 0.014 is small (much less than the conventional significance levels of 𝛼 = 0.10, 𝛼 = 0.05, or 𝛼 = 0.01), the null hypothesis that the distributions of the numbers of items which require repairing for Mondays and Wednesday are the same is rejected. Thus, based on the signed rank test, there is strong evidence to believe the distribution of the numbers of items which require repairing for Mondays is shifted to the right of the distribution of the number of items which require repairing on Wednesdays. Section 4 is scored as follows: Essentially correct (E) if the response includes the following two components: The value of the test statistic in part (d) is used to correctly calculate the simulated pvalue from the provided simulation results. A correct conclusion is provided in context, with justification based on the linkage between the size of the p-value and the conclusion. Partially correct (P) if the response includes only one of the two components listed above. Incorrect (I) if the response does not meet the criteria for E or P. Notes: The conclusion must be related to the alternative hypothesis, and must reference the shifting in the distribution. If the p-value is incorrect, section 2 is scored as P if the response includes proper linkage and a conclusion in context consistent with that p-value. If the p-value is correct and less than 0.05, wording that states or implies the alternative hypothesis is proven lowers the score from E to P in section 4. Scoring Each essentially correct (E) step counts as 1 point. Each partially correct (P) step counts as ½ point. 4 Complete Response 3 Substantial Response 2 Developing Response 1 Minimal Response If a response is between two scores (for example, 2½ points), score down.