Math 258: Intro to Statistics B Final Exam Spring 2023 Name: Please read the instructions carefully. You have 120 minutes to complete the exam. Including this page, your exam booklet should have 9 pages. Read each question carefully. You must show all necessary work to receive full credit. You are permitted the use of a calculator – cell phones are not allowed. No textbooks, notes, or neighboring students may be referenced during the exam. Round all calculations to 3 decimal places unless otherwise instructed. Q Score 1 2 3 4 5 Total 1 1) Age of the Bride The data below represent the ages of brides, in years, from a random sample of 20 marriage certificates filed in Cook County, Illinois. 35 27 24 30 24 27 31 24 23 22 25 23 28 22 26 21 33 25 24 27 Note: Summary statistics and a normality test for this data can be found in the Minitab Output. 1a) Use the sign test to determine whether the median age of brides in Cook County, Illinois is less than the national average age of 25.8 years, at level of significance α = 0.05. State the hypotheses to be tested. Calculate the test statistic k0 . Identify the appropriate critical value(s). State both your formal and proper conclusions. 1b) Use the one-sample t-test to determine whether the mean age of brides in Cook County, Illinois is less than the national average age of 25.8 years, at level of significance α = 0.05. State the hypotheses to be tested. Calculate the test statistic t0 . Identify the appropriate critical value(s), and sketch the rejection region for this test. State both your formal and proper conclusions. 1c) Which of the above tests is more appropriate for the given sample? Justify your answer. 2) Comparing Athletes The resting heart rates, in beats per minute (bpm), were measured for independent random samples of 11 swimmers and 9 track athletes, producing the following data. Swimmers Track 79 82 66 81 62 70 75 70 79 75 78 72 77 68 71 63 75 65 68 77 62 63 65 66 68 68 70 70 71 72 75 75 75 77 77 78 79 79 81 82 Note: Summary statistics and a normality test for this data can be found in the Minitab Output. 2a) Use the Mann-Whitney test to determine whether the median resting heart rate of swimmers is greater than that of track athletes, at level of significance α = 0.05. State the hypotheses to be tested. Calculate the test statistic U . Identify the appropriate critical value(s). State both your formal and proper conclusions. 2b) Use Welch’s t-test to determine whether the mean resting heart rate of swimmers is greater than that of track athletes, at level of significance α = 0.05. State the hypotheses to be tested. Calculate the test statistic t0 . Identify the appropriate critical value(s), and sketch the rejection region for this test. State both your formal and proper conclusions. 2c) Which of the above tests is more appropriate for the given sample? Justify your answer. 3) The Salary Gap The following data represent the annual salaries, in thousands of dollars, for independent random samples of non-minority women, non-minority men, and minority individuals employed by a particular shoe company. Women Men Minorities 23 45 18 41 55 30 54 60 34 60 70 41 78 72 44 18 23 30 34 41 41 44 54 55 60 60 70 72 78 45 3a) Use the Kruskal-Wallis test to determine whether the median annual salary at this company differs between women, men, and minorities, at level of significance α = 0.05. Calculate the test statistic H. Identify the appropriate critical value. State both your formal and proper conclusions. 3b) Use one-way ANOVA to determine whether the mean annual salary at this company differs between women, men, and minorities, at level of significance α = 0.05. Complete the ANOVA table below to calculate the test statistic F0 . Source df SS Treatment 1884.13 Error 2615.20 MS F Identify the appropriate critical value, and sketch the rejection region for this test. State both your formal and proper conclusions. 3c) Which of the above tests is more appropriate for the given sample? Justify your answer. Note: Summary statistics and a normal probability plot for this data can be found in the Minitab Output. Remember to check both of the necessary conditions for ANOVA. 3d) Use pairwise Mann-Whitney tests with Bonferroni adjustment to determine which pairs of treatments, if any, have significantly different medians, at level of significance α = 0.05. The level of significance to be used in each individual comparison is α m = Circle the correct conclusions for each comparison. Comparison Mann-Whitney P -value Formal Conclusion Population Medians are. . . women vs. men 0.4633 (reject / FTR) H0 (similar / different) women vs. minorities 0.1732 (reject / FTR) H0 (similar / different) men vs. minorities 0.0122 (reject / FTR) H0 (similar / different) 3e) Use Tukey’s test to determine which pairs of treatments, if any, have significantly different means, at level of significance α = 0.05. Calculate the test statistics q0 (i, j). Identify the appropriate critical value. Circle the correct conclusions for each comparison. x̄i − x̄j q0 (i, j) qα,ν,k Formal Conclusion Population Means are. . . x̄men − x̄minorities = 27.0 (reject / FTR) H0 (similar / different) x̄women − x̄minorities = 17.8 (reject / FTR) H0 (similar / different) x̄men − x̄women = 9.2 (reject / FTR) H0 (similar / different) 3f ) Which of the above techniques is more appropriate for the given sample? Justify your answer. Bonus: Based on whichever set of multiple comparisons is most appropriate, use the underlining technique to identify groups of treatments with similar means/medians. men women minorities 4a) Effective Marketing To compare the effectiveness of a particular company’s national marketing campaign in two different cities, researchers randomly selected individuals from each city, and asked them to report their level of awareness of the company’s product. Out of 200 people sampled in New York, 91 reported they had purchased or were aware of the company’s product. Out of 300 people in Los Angeles, 132 had purchased or were aware of the product. Use the two-sample normal approximation test of proportions to determine whether there is a difference in the proportions of people in these cities who are familiar with this product, at level of significance α = 0.05. State the hypotheses to be tested. Calculate the test statistic z0 . Identify the appropriate critical value(s), and sketch the rejection region. State both your formal and proper conclusions. 4b) Effective Marketing The complete set of data collected for the above study is provided in the table below. Researchers randomly sampled individuals from New York, Los Angeles, and Chicago, and asked them to report whether they had purchased, were aware of, or were not aware of the company’s product. Use the appropriate chi-square test (independence or homogeneity) to determine whether there is a difference in the level of awareness of this company’s product among these three cities, at level of significance α = 0.05. Awareness City Purchased Aware Not Aware (Total) New York 36 55 109 200 (Expected) 41.54 100.31 (Contribution) 0.74 0.75 Chicago 45 56 (Expected) 31.15 43.62 (Contribution) 6.15 3.52 54 Los Angeles (Expected) (Contribution) (Total) 49 150 78 168 300 87.23 0.98 150.46 2.04 189 326 135 Identify which test is to be performed. Calculate any missing expected counts, and contributions to the test statistic. Calculate the test statistic χ20 . Identify the appropriate critical value, and sketch the rejection region. State both your formal and proper conclusions. Identify which cell deviates most from H0 . 650 5) Boiling at Altitude In the 1850’s, Scottish physicist James Forbes recorded the barometric pressure (in mm Hg) and the boiling point of water (in degrees Fahrenheit) at different points in the Swiss Alps. A simple linear regression model was fit to this data, treating barometric pressure as predictor and boiling point as response. Note: The output from this analysis is provided in the Minitab Output. State the estimated equation of the regression line for these variables. What should we expect the boiling point at altitude to be when the barometric pressure is 26 mm Hg? Identify and interpret the coefficient of determination for this model. Construct and interpret a 95% confidence interval for the slope β1 . Is this relation significant? Justify your answer. Construct and interpret the appropriate interval to estimate the mean boiling point of water in the Swiss Alps when the barometric pressure is 26 mm Hg. Use the Minitab Output to assess model conditions for this data. Briefly justify your answers. ◦ (Yes / No) Does a linear model appear to be appropriate for the given data? ◦ (Yes / No) Do the residuals appear to have constant variance as x varies? ◦ (Yes / No) Do the residuals appear to be normally distributed? ◦ (Yes / No) Would it be appropriate to use this model for prediction or inference? Final (Practice) Exam Minitab Express Output Question 1: Question 2: MAT 258 Question 3: Question 5: