Study Guide for Final Exam STAT 303 STAT 303: Final Exam Name:__________________________ 1. Consider the following t-test output from a breast cancer dataset from University of WI – Madison. There were two groups in this dataset (Malignant and Benign). Of interest here is the study of the Texture measurement of the cell. You can see from the JMP output that the confidence interval for the true difference in the average texture measurements between the malignant cells and the benign cells goes from 3.02 up to 4.36. Identify whether the following statements about this interval are correct. Differences were computed as follows. Difference = Malignant - Benign a. We are 95% certain that the true difference in the average texture measurements is between 3.02 and 4.36. TRUE FALSE b. This interval provides enough statistical evidence to say that there is a difference in the average texture measurements between malignant and benign cells. TRUE FALSE c. The average difference from this data, i.e. 3.69, is guaranteed to be in our 95% confidence interval (i.e. 3.02 up to 4.36). TRUE FALSE 2. The smaller the p-value, the more evidence we have to support the research question. TRUE FALSE 3. The standard two-sample pooled t-test for comparing two averages is not valid if the variability in one group is very different than the variability in the other group. TRUE FALSE 4. The two-sample t-test for comparing two averages is only valid when the number of observations in each group are the same. TRUE FALSE 1 Study Guide for Final Exam STAT 303 Multiple Choice Questions Consider the MN_WI_IA_Colleges data file. The investigation here centers around whether or not the percentage of alumni who donate money is different for private and public schools. The summary statistics for each group are provided here. 5. Consider the summaries above for the Private and Public schools. a. The mean and median are larger for Private because this group has more observations in it (N = 29 vs. N = 21). b. The mean and median are larger for Private which means that alumni from Private institutions tend to donate more often alumni from Public institutions. c. Both a. and b. 6. Consider the summaries above for the Private and Public schools. a. The standard deviation for Private is larger; thus, the percentage of alumni who donate from Private institutions is likely to vary more than the percentage of alumni who donate from Public institutions. b. The range for Private is larger; thus, the percentage of alumni who donate from Private institutions is likely to vary more than the percentage of alumni who donate from Public institutions. c. Both a. and b. 7. Consider the following margin-of-error computations for each group. Answer the following. Margin of Error = 2 * Standard Error, where standard error = standard deviation / sqrt(N) Margin of Error for Private = 2 * 2.54 = 5.08 Margin of Error for Public = 2 * 1.82 = 3.64 a. The margin-of-error for Public is smaller because the number of observations in this group is smaller. b. The margin-of-error for Private is larger because the variation in this group is larger. c. Both a. and b. 2 Study Guide for Final Exam STAT 303 8. Consider the following mean ± margin-of-error values for each group. Answer the following. Mean ± MOE for Private = 30.17 ± 5.08, Mean ± MOE for Public = 16.57 ± 3.64, 25.09 up to 35.25 12.93 up to 20.21 a. These formulas and their corresponding intervals allow us to compare the true alumni donation rate for private and public institutions in any state. b. These formulas and their corresponding intervals allow us to compare the dollar amount that each private school alum donates to the dollar amount that each public school alum donates for institutions in Minnesota, Wisconsin, and Iowa. c. These formulas and their corresponding intervals are necessary to compare the average alumni donation rate from the private schools in our sample to the average alumni donation rate from the public schools in our sample (i.e. comparing the 30.17 to the 16.57). d. None of the above. Suppose two different studies are done to investigate the average time it takes to fall asleep for women and men. The results from these two studies are given here. Study A Study B Statistic Women Men Statistic Women Men Average 18 minutes 12 minutes Average 18 minutes 15 minutes Standard Deviation Not computed, but similar for both groups Standard Deviation Count Not computed, but similar for both groups 100 100 Count 100 100 9. Which study provides stronger evidence that there is a difference in the time it takes to fall asleep between women and men? a. Study A b. Study B c. The strength of evidence would be similar for these two studies. 3 Study Guide for Final Exam STAT 303 Suppose two more studies are conducted on this issue. The outcomes from these studies are given below. Study C Study D Statistic Women Men Statistic Women Men Average 18 minutes 15 minutes Average 18 minutes 15 minutes Standard Deviation Not computed, but similar for both groups Standard Deviation Count Not computed, but similar for both groups 200 200 Count 100 100 10. Which study provides stronger evidence that there is a difference in the time it takes to fall asleep between women and men? a. Study C b. Study D c. The strength of evidence would be similar for these two studies. 11. For simplicity, suppose that the standard deviation in the four previously mentioned studies is the same. Answer the following. a. The p-value for Study A is likely to be larger than the p-value for Study B. True False b. The p-value for Study C is likely to be larger than the p-value for Study B. True False 12. You plan to use a random sample of students at your school to investigate a claim that the average amount spent on the most recent haircut by the population is more than $15. Why would a large (random) sample be better than a small one? (3 pts) a. Because you’re more likely to get extreme haircut prices with a larger sample. b. Because larger samples produce less variability in haircut prices within one sample. c. Because larger samples produce less variability in average haircut prices from sample to sample. 4 Study Guide for Final Exam STAT 303 Consider the following example in which two doctors were compared in their ability to identify the number of lymph nodes in patients. Differences were computed as Doctor A – Doctor B. Data JMP Output Research Question: Do these two doctors differ in the average number of lymph nodes found? 13. Circle the most appropriate conclusion for the outcome of this test. a. We are 95% certain that there is not enough evidence to suggest that these doctors disagree in the average number of lymph nodes found for each patient. b. We are 95% certain that these two doctors will find on average a different number of lymph nodes for the patients in this study. c. We are 95% certain that any two doctors will find on average a different number of lymph nodes for men with AIDS or an AIDS related condition. d. We are 95% certain that these two doctors will find on average a different number of lymph nodes for men with AIDS or an AIDS related condition. 14. Circle the most appropriate interpretation of the 95% confidence interval. You can assume that the lower endpoint of the interval is about 2 and the upper endpoint is about 4. a. We are 95% certain that on average Doctor B will find about 2 to 4 more lymph nodes per patient than Doctor A. b. We are 95% certain that on average Doctor A will find about 2 to 4 more lymph nodes per patient than Doctor B. c. We are 95% certain that on average Doctor A will find about 2 to 4 more lymph nodes than Doctor B. d. We are 95% certain that this interval suggests no difference between these two doctors. 15. The data below come from the 2005 Youth Risk Behavior Survey of American high schools students, which was conducted by a branch of the United States government. This study investigated the 5 Study Guide for Final Exam STAT 303 potential relationship between smoking and body mass index (BMI). Body mass index is one possible measurement of one’s fitness level. Generally speaking, the smaller the number the more fit someone is. Current Smoker Yes No Average BMI Level 23.92 23.51 Research Question: "Does the average BMI differ between American high school students who smoke and those who do not smoke?" a. Describe the population of interest in this study. b. One variable of interest is whether the subject was a current smoker, or not. Is this variable categorical, or numeric? Numerical Categorical c. The other variable of interest is the subject's BMI. Is this variable categorical, or numeric? Numerical Categorical d. This study did not involve randomly assigning subjects to the two groups being compared. This lack of random assignment hinders our ability to say that smoking causes changes in average BMI levels. TRUE FALSE 6 Study Guide for Final Exam STAT 303 16. Evaluate each of the sentences below as "True" or "False" assuming that the p-value from the appropriate analysis is 0.2561. a. There is evidence (using a 5% error rate) that the average BMI differs between American high school students who smoke and those who do not smoke. TRUE FALSE b. There is evidence (using a 5% error rate) that the average BMI differs between all high school students who smoke and those who do not smoke. TRUE FALSE c. There are far fewer smokers than nonsmokers. So, we cannot establish statistical significance in this study because the groups being compared must be of the same size. TRUE FALSE 17. For this problem, the relationship between Amount of Taxes ($) of a home and the Square Foot (ft2), i.e. size of home, is under consideration. Response Variable: Taxes Predictor Variable: SquareFeet Mean and Variance Function o 𝐸(𝑇𝑎𝑥𝑒𝑠 |𝑆𝑞𝑢𝑎𝑟𝑒𝐹𝑒𝑒𝑡) = 𝛽0 + 𝛽1 ∗ 𝑆𝑞𝑢𝑎𝑟𝑒𝐹𝑒𝑒𝑡 o 𝑉𝑎𝑟(𝑇𝑎𝑥𝑒𝑠 |𝑆𝑞𝑢𝑎𝑟𝑒𝐹𝑒𝑒𝑡) = 𝜎 2 Consider the following output from an analysis done in JMP. Scatterplot with simple linear regression line Summary of Fit and ANOVA output 7 Study Guide for Final Exam STAT 303 Output regarding parameter estimates Answer the following questions using the output provided above. a. Write out the estimated mean function that can be used to predict taxes as a function of square foot of home. b. Interpret, in context and using laymen’s language, the estimated slope. c. Interpret, in context and using laymen’s language, the estimated y-intercept. d. What is the purpose of the t Ratio and Prob > |t| quantities for SquareFeet in the Parameter Estimates portion of the output? Discuss. e. What is the purpose of the R2 value? Interpret this value in context of this problem. 8 Study Guide for Final Exam STAT 303 18. In a regression analysis, what is the purpose of the following plot? On this plot, provide a rough sketch of a “good” pattern that we’d like to see in this plot. 19. In a regression analysis, what is the purpose of the following plot? On this plot, provide a rough sketch a “good” pattern that we’d like to see in this plot. 9