Got Sleep? Question #4 Intent of the Question The primary goal of this question was to assess a student's ability to identify, set up, perform, and interpret the results of an appropriate hypothesis test to address a particular question. More specific goals were to assess a student's ability to (1) state appropriate hypotheses; (2) identify the appropriate statistical test procedure and check appropriate conditions for inference; (3) calculate the appropriate test statistic and p-value; and (4) draw an appropriate conclusion, with justification, in the context of the study. Question #4 The National Sleep Foundation conducts an annual survey to track sleep related behaviors of U.S. adults. In their most recent survey, a random sample of 1,018 adults answered the question "About how much actual sleep would you estimate you typically get on work nights or weeknights?" The frequency table below summarizes the responses by whether they were less than, equal to, or more than the recommended 7 to 9 hours of sleep and by the age group of the respondent. Typical Weeknight Sleep Time Age Group Less Than 7 Hours Recommended 7 to 9 Hours More Than 9 Hours Total 18 – 34 44 78 109 231 35 – 54 144 200 218 562 55 or older 47 77 101 225 Total 235 355 428 1,018 At the 𝛼 = 0.05 significance level, do the data provide convincing statistical evidence that there is an association between age group and typical weeknight sleep time for adults in the United States? Solution Step 1 States a correct pair of hypotheses. The null hypothesis is that typical weeknight sleep time is independent of (that is, it is not associated with) age group for the population of adults in the United States. The alternative hypothesis is that typical weeknight sleep time is not independent of (that is, it is associated with) age group for the population of adults in the United States. Scoring Step 1 Each of steps 1, 2, 3, and 4 are scored as essentially correct (E), partially correct (P), or incorrect (I). Step 1 is scored as follows: Essentially correct (E) if the response correctly states both hypotheses with at least one in context. Partially correct (P) if the response correctly states both hypotheses but not in context, OR the hypotheses were reversed with at least one stated in context. Incorrect (I) if the response does not meet the criteria for E or P. Note: If the hypotheses contain language that suggests that the response refers to the sample data (for example, by referring to the adults that answered the question), step 1 is scored as incorrect (I). Solution Step 2 Identifies a correct test procedure (by name or by formula) and checks appropriate conditions. The appropriate test is a chi-square test of independence. The conditions for this test were satisfied because: The question states that the sample was randomly selected. The expected counts for all nine cells of the table were at least 5, as seen in the following table that lists expected counts in parentheses beside the observed counts: Age Group 18 – 34 35 – 54 55 or older Total Less Than 7 Hours 44 (53.3) 144 (129.7) 47 (51.9) 235 Recommended 7 to 9 Hours 78 (80.6) 200 (196.0) 77 (78.5) 355 More Than 9 Hours 109 (97.1) 218 (236.3) 101 (94.6) 428 Total 231 562 225 1,018 Scoring Step 2 Essentially correct (E) if the response correctly includes the following three components: Identifies a chi-square test of independence by name or by formula for the chi-square test statistic. States AND verifies the random sampling condition. States AND verifies the technical condition that all expected counts are greater than 5. Partially correct (P) if the response correctly includes two of the three components listed above. Incorrect (I) if the response does not meet the criteria for E or P. Notes: • If the response identifies the test procedure as a chi-square test of homogeneity of proportions, step 2 does not receive credit for component 1. • If the response identifies the correct test procedure but gives an incorrect formula for the test statistic, then this is considered a contradiction and does not meet the criteria for component 1. • Stating the condition that the expected counts must be greater than 5 is not in itself sufficient for satisfying component 3; the condition must be checked by reporting expected counts, or minimally reporting the value of the smallest expected count and indicating that it is at least 5. • If the response includes an incorrect technical condition such as "n ≥ 30" or "normality," then this will be considered a parallel solution and credit will not be granted for component 3. • If the response states and verifies the condition that 80 percent of all expected counts must be ≥ 5 and all expected counts must be ≥ 1, then the response can receive credit for component Solution Step 3 Correct mechanics, including the value of the test statistic and p-value (or rejection region). 2 The test statistic is calculated from 𝜒 𝑂−𝐸 2 = ; that is, 𝐸 𝜒2 = 1.631 + 0.081 + 1.453 + 1.569 + 0.082 + 1.415 + 0.470 + 0.027 + 0.433 = 7.161. The p-value is P 𝜒 2 ≥ 7.161 = 0.1276 based on 3 − 1 × 3 − 1 = 4 degrees of freedom. Scoring Step 3 Step 3 is scored as follows: Essentially correct (E) if the response correctly calculates the following two components: Test statistic p-value or critical value Partially correct (P) if the response correctly calculates one of the two components listed above. Incorrect (I) if the response does not meet the criteria for E or P. Notes: When a response has an error in one calculation, future calculations are considered correct if they follow from the initial miscalculation. The correct critical value is 9.49 for a significance level of 0.05. Solution Step 4 States a correct conclusion in the context of the study, using the result of the statistical test. Because the p-value is greater than the given significance level of 𝛼 = 0.05, we fail to reject H0. The data do not provide enough evidence at the 0.05 level of significance to conclude that there is an association between age group and typical weeknight sleep time for adults in the United States. Scoring Step 4 Essentially correct (E) if the response provides a correct conclusion in context, with justification based on linkage between the p-value and the given 𝛼 = 0.05. Partially correct (P) if the response provides a correct conclusion, with linkage to the p-value, but not in context; OR if the response provides a correct conclusion in context, but without justification based on linkage to the p-value. Incorrect (I) if the response does not meet the criteria for E or P. Notes Section 4: The conclusion must be consistent with the hypotheses. The conclusion must be related to the alternative hypothesis. If the p-value is incorrect, step 4 is scored as E if the response includes proper linkage and a conclusion in context consistent with that p-value. If the p-value is incorrect and less than 0.05, wording that states or implies the alternative hypothesis is proven lowers the score one level (that is, from E to P or P to I) in step 4. If the p-value is greater than 0.05, wording that states or implies that the null hypothesis is accepted lowers the score one level (that is, from E to P or P to I) in step 4. A response including incorrect statistical language (for example, discussing the correlation instead of the association between the two variables) lowers the score one level (that is, from E to P or P to I) in step 4. Since 𝛼 = 0.05 is given, explicit linkage between the size of the p-value and 𝛼 = 0.05 is required (for example, stating that the p-value is large is not sufficient). Scoring Each essentially correct (E) step counts as 1 point. Each partially correct (P) step counts as ½ point. 4 Complete Response 3 Substantial Response 2 Developing Response 1 Minimal Response If a response is between two scores (for example, 2½ points), score down.