ST 305 PRACTICE PROBLEMS FOR FINAL EXAM Reiland Topics covered on final exam: Chapters 24-27 in text. This material is covered in webassign homework assignments 12 through 14 Exam information: 3 hour time limit; materials allowed: calculator (no laptops), 1 8 "# x11 sheet of paper (2 sided)with notes, definitions, formulas, etc. Normal, t, and chi-square tables will be provided with the exam. The questions on the final exam will be multiple choice format. Answers are at the end of the document. 1. To investigate the possible link between fluoride content of drinking water and cancer, the cancer death rates (number of deaths per 100,000 population) from 1952-1969 in 20 selected U. S. cities - the ten largest fluoridated cities and the ten largest cities not fluoridated by 1969 - were recorded. These data were used to calculate for each city the annual rate of increase in cancer death rate over this 18 year period. The data are given below: FLUORIDATED City Chicago Philadelphia Baltimore Cleveland Washington Milwaukee St. Louis San Francisco Pittsburgh Buffalo NONFLUORIDATED Annual Increase in Cancer Death Rate 1.0640 1.4118 2.1115 1.9401 3.8772 -.4561 4.8359 1.8875 4.4964 1.4045 City Los Angeles Boston New Orleans Seattle Cincinnati Atlanta Kansas City Columbus Newark Portland Annual Increase in Cancer Death Rate .8875 1.7358 1.0165 .4923 4.0155 -1.1744 2.8132 1.7451 -.5676 2.4471 a. Construct a 95% confidence interval for the difference between the mean annual increases in cancer death rates for fluoridated and nonfluoridated cities. (Use 18 degrees of freedom; do not assume equal variances). b. What assumption(s) must be satisfied for the confidence interval in part a) to be valid? 2a. We are interested in comparing the average supermarket prices of two leading colas. Our sample was taken by randomly selecting eight supermarkets and recording the price of a six-pack of each brand of cola at each supermarket. The data are shown in the following table: Brand 1 cola Brand 2 cola Difference 1 $2.25 $2.30 $-0.05 2 2.47 2.45 0.02 Supermarket 3 4 2.38 2.27 2.44 2.29 -0.06 -0.02 5 2.15 2.25 -0.10 6 2.25 2.25 0 7 2.36 2.42 -0.06 8 2.37 2.40 -0.03 Summary statistics: . œ !Þ!$(&ß =. œ !Þ!$)" Find a 98% confidence interval for the difference in mean price of brand 1 and brand 2. i) !Þ!$(& „ !Þ!%!% ii) !Þ!$(& „ !Þ"$*$ iii) !Þ!$(& „ !Þ!%(" iv) !Þ!$(& „ !Þ!$%( 2b. Perform a hypothesis test to test if the difference in mean prices of brand 1 and brand 2 colas is different from 0. Use α = .02. Define the parameters, state hypotheses, calculate test statistic and T value. ST 305 3. Final Exam Practice Problems page 2 A new weight-reducing technique, consisting of a liquid protein diet, is currently undergoing tests by the Food and Drug Administration (FDA) before its introduction into the market. The weights of a random sample of five people are recorded before they are introduced to the liquid protein diet. The five individuals are then instructed to follow the liquid protein diet for 3 weeks. At the end of this period, their weights (in pounds) are again recorded. The results are listed in the table. Let ." be the true mean weight of individuals before starting the diet and let .# be the true mean weight of individuals after 3 weeks on the diet. 1 2 3 4 5 Weight before diet 156 201 194 203 210 Weight after diet 149 196 191 197 206 Test to determine if the diet is effective at reducing weight. Use α œ Þ"!. 4. A cell phone company wants to determine if the use of text messaging is independent of age. The following data has been collected from a random sample of customers. Under 21 21-39 40 and over Regularly use text messaging 82 57 6 Do not regularly use text messaging 38 34 83 4a. What is the expected value for the “under 21” and “regularly use text messaging” cell. 4b. To conduct a hypothesis test using a 0.01 level of significance, the value of the critical value is: i) 16.812 ii) 15.086 iii) 9.210 iv) 2.576 v) 2.33 4c. What contribution does the cell “21-39” and ”Do not regularly use text messaging” make to the value of the test statistic? 4d. The value of the test statistic is 88.3. The appropriate conclusion is i) reject H! and conclude the variables have a curvilinear relationship; ii) reject H! and conclude the variables are not related, that is, conclude that the variables are independent. iii) do not reject H! and conclude the variables are independent. iv) reject H! and conclude the variables are related. 4e. Use the values of the standardized residuals 9,=/<@/./B:/->/. /B:/->/. to select the correct statement regarding the relationship between text messaging and age. i) The ”Under 21” age group uses text messaging less than the other age groups. ii) For each age group the standardized residual has a negative value and a positive value, so text message and age are not related. iii) The older the age group, the less that text messaging is used. 5. A simple linear regression was used to predict the score y on a final exam from the score x on the first exam. The slope of the least squares regression line is .(&. The standard error of the slope is ."" and the sample size is 200. A 90% confidence interval for the true slope is (though the .0 is 8 #, use 200 .0 in the >-table) a. .'% to .)' b. .&$ to .*( c. .&( to .*$ d. .'% to .)' e. none of the above 6. Refer to the previous problem. To test the null hypothesis that the slope is zero versus the one-sided alternative that the slope is positive, we use the test statistic t œ a. 6.82 b. .05 c. .15 d. .95 e. .75 ST 305 7. Final Exam Practice Problems page 3 When two competing teams are equally matched, the probability that each team wins any game is !Þ&!Þ The National Basketball Association (NBA) championship goes to the team that wins four games in a best-of-seven series. If the teams were evenly matched, the probability that the final series ends with one of the teams sweeping four straight games would be #Ð!Þ&Ñ% œ !Þ"#& [team A wins in 4 games with probability Ð!Þ&Ñ% à team B can also win in 4 games with the same probability, so the probability the series ends in 4 games is #Ð!Þ&Ñ% ]. Similarly team A can win in 5 games if team A wins 3 of the first 4 games and then wins game 5. So team A wins in 5 games with probability % G$ Ð!Þ&Ñ$ Ð!Þ&чÐ!Þ&Ñ œ !Þ"#&. But team B can also win in 5 games with the same probability, so the probability that the series ends in 5 games is !Þ#&. Similar probability calculations show that the probability is !Þ$"#& that the series lasts six games, and the probability is !Þ$"#& that the series lasts the full seven games. The table below shows the number of games it took to decide each of the last 57 NBA champs. Do you think the teams are usually equally matched? Give statistical evidence to support your conclusion. Length of series NBA finals 8. 4 games 7 6 games 22 7 games 15 The president of a large university has been studying the relationship between male/female supervisory structures in his institution and the level of employees' job satisfaction. The results of a recent survey are shown in the table below. Conduct a test at the 5% significance level to determine whether the level of job satisfaction depends on the boss/employee gender relationship. Level of Satisfaction Satisfied Neutral Dissatsfied 9. 5 games 13 Male/Female 60 27 13 Boss/Employee Female/Male 15 45 32 Male/Male 50 48 12 Female/Female 15 50 55 (True or false) In a hypothesis test, a T -value of !Þ!$ means that there is only probability !Þ!$ that the null hypothesis is true. 10. A chi-squared test for independence with 6 degrees of freedom results in a test statistic 13.61. Using the tables, the most accurate statement that can be made about the P-value for this test is that: a. P-value > .10 b. P-value > .05 c. .05 < P-value < .10 d. .025 < P-value < .05 11. Sociologists are of the opinion that there has been a decrease in the difference in ages at first marriage for men and women since 1975. We want to examine data to determine if this decrease is significant. The following data summary and regression results were obtained, where the B variable is year and the C variable is the age difference (husband age wife age) at first marriage. Variable Count Year (B) 24 husband-wife age (C) 24 Mean 1986.5 2.3125 StDev 7.071 0.249 ST 305 Final Exam Practice Problems page 4 a. b. c. Interpret the value of the least squares slope ," . What is the value of the test statistic for testing L! À "" œ !? For the hypothesis test L! À "" œ ! vs L+ À "" !ß select the choice below that gives the correct T -value and correct conclusion. i. The T -value is 0.68; do not reject L! À "" œ !; there is no linear relationship since 1975 between year and age difference between husband and wife at first marriage. ii. The T -value is 0.000152; reject L! À "" œ !; there is evidence that since 1975 the age difference (husband age wife age) has increased. iii The T -value is 0.0001275; reject L! À "" œ !; there is evidence that since 1975 the age difference (husband age wife age) has decreased. iv The T -value is 0.0001275; do not reject L! À "" œ !; there is no linear relationship since 1975 between year and age difference between husband and wife at first marriage. d. e. What is a 95% confidence interval for the slope? Find a 95% confidence interval for the mean difference (husband age wife age) at first marriage in 1998. Find a 95% prediction interval for the difference (husband age wife age) for a particular couple getting married for the first time in 1998. f. 12. Over 6 decades the Gallup Organization has periodically asked the following question: If your party nominated a generally well-qualified person for president who happened to be a woman, would you vote for that person? Below is a table showing the percentage answering “yes” and the year of the century (37 = 1937). % Yes 92 82 78 80 76 73 66 53 57 55 57 54 52 48 33 33 Year 99 87 84 83 78 75 71 69 67 63 59 58 55 49 45 37 summary statistics: B œ '(Þ%% a. b. c. =B œ "'Þ( C œ '"Þ)" =C œ "(Þ"* < œ Þ*(" Determine the estimates ,! and ," of the parameters "! and "" in the linear model y œ "! "" x %, where B is year and C is the percentage who respond “yes”. Use the least squares line to estimate the percentage of respondents that would say “yes” in 1997. Determine the estimate =/ of the standard deviation 5 of the error component % ST 305 Final Exam Practice Problems page 5 (note that the sum of squares of residuals ÐC3 sC3 Ñ# œ 255.748). "' 3œ" d. e. f. Calculate a 95% confidence interval for the slope "" . (Note that WIÐ," Ñ œ / ) 8" =B = Conduct an appropriate hypothesis test (use α = .05) to determine if the year of the century is useful for predicting the percentage of respondents that would answer “yes” to the above question. State the hypotheses, find the value of the test statistic, and state your conclusion based on the Pvalue or the rejection region. What assumptions must be (approximately) satisfied for the above statistical procedures to be valid? Perform 2 diagnostics to check these assumptions. SOLUTIONS 1. a. B" œ 2.2573, s21 œ 2.753, n" œ 10; B# œ 1.3411, s22 œ 2.429, n# œ 10, t.025,") œ 2.101, (2.2573 1.3411) „ 2.101 2.753 10 2.429 10 œ .9162 „ 1.5124 œ ( .5962, 2.4286). b. (i) The two samples must be independently selected (ii) The distributions of increases in cancer death rates for fluoridated and nonfluoridated cities can be approximated by normal models. (iii) The two groups we are comparing must be independent of each other. 2a. i). . „ >Þ!"ß( =.8 œ !Þ!$(& „ #Þ**)! !Þ!$)" œ !Þ!$(& „ !Þ!%!%Þ ) 2b. L! À ." .# œ .. œ ! LEÀ ." .# œ .. Á ! α œ Þ!# where ." is the mean price of brand 1 cola and .# is the mean price of brand 2 cola. test statistic: > œ Þ!$(& œ #Þ()%. Þ!$)" ) 3. T @+6?/ œ #T Ð>( #Þ()%Ñ œ Þ!#(Þ Since T @+6?/ Þ!#, do not reject the null hypothesis L! À ." .# œ .. œ !. There appears to be no significant difference in the mean price of the 2 brands of soda. L! À ." .# œ .. œ !ß LE À ." .# œ .. !à α œ Þ"! . œ &ß =. œ "Þ&)Þ & > œ =.. œ "Þ&) œ (Þ!)à T -value œ T Ð>% (Þ!)Ñ ¸ Þ!!"Þ 8 & Conclusion: Reject L! and conclude that the mean weight of individuals after the diet is less than the mean weight of individuals before the diet. 4. Under 21 21-39 40 and over 4a. 4b. 4c. 4d. 4e. Regularly use text messaging 82 (58) 57 (43.983) 6 (43.017) Do not regularly use text messaging 38 (62) 34 (47.017) 83 (45.983) ‡ -96?78 >9>+6 expected count: <9A >9>+6 œ &)Þ 1<+8. >9>+6 iii) 9.210 (.0 œ Ð$ "чÐ# "ÑÑ # Ðobserved - expected)# œ Ð$%%(Þ!"(Ñ œ $Þ'!% expected %(Þ!"( iv) reject H! and conclude the variables are related. iii) The older the age group, the less that text messaging is used. (see table below) ST 305 Final Exam Practice Problems standardized residuals Under 21 21-39 40 and over Regularly use text messaging $Þ"& "Þ*' &Þ'% page 6 9,=/B: /B: Do not regularly use text messaging $Þ!& "Þ*! &Þ%' 5. c ," „ >‡ WIÐ," Ñ œ Þ(& „ "Þ'&#&‡Þ"" ," 6. a > œ WIÐ, œ Þ(& Þ"" "Ñ 7. (chap 26) L! À teams are evenly matched L+ À teams are not evenly matched The table below shows the observed values in each cell and the expected cell values in parentheses if the teams are evenly matched. Length of series NBA finals Test statistic: # \ # œ Ð((Þ"#&Ñ (Þ"#& 4 games 7 (7.125) Ð"$"%Þ#&Ñ# "%Þ#& 5 games 13 (14.25) Ð##"(Þ)"#&Ñ# "(Þ)"#& 6 games 22 (17.8125) Ð"&"(Þ)"#&Ñ# "(Þ)"#& 7 games 15 (17.8125) œ "Þ&% Rejection region: if the null hypothesis is true, the test statistic \ # has a chi-square distribution with Ð5 "Ñ œ Ð% "Ñ œ $ .0 . If α œ Þ!&, the RR is \ # (Þ)"&. Note that T @+6?/ œ !Þ'( 8. Conclusion: do not reject L! . There is no evidence that the NBA championship series are inconsistent with the conjecture that the teams are evenly matched. (chap 26) Expected cell counts are in parentheses: Boss/Employee Level of Satisfaction Male/Female Female/Male Male/Male Female/Female Satisfied 60 (33.175) 15 (30.521) 50 (36.493) 15 (39.81) Neutral 27 (40.284) 45 (37.062) 48 (44.313 50 (48.341) Dissatsfied 13 (26.54) 32 (24.417) 12 (29.194) 55 (31.848) L! À Boss/employee relationship and job satisfaction are independent L+ À Boss/employee relationship and job satisfaction are dependent Test statistic: ;# œ Ð9,=/<@/./B:/->/.Ñ œ *#Þ(!*; degrees of freedom œ Ð3 "чÐ4 "Ñ œ ' /B:/->/. "# # " T @+6?/ œ Þ!!!. Conclusion: reject H! and conclude that boss/employee relationship and job satisfaction are related. 9. false 10. d 11. a. ," œ !Þ!#% (approximately); this means that each year since 1975 the average difference (husband age wife age) has decreased by Þ!#% ," b. > œ WIÐ, œ !Þ!#$*&'ÞÞÞ !Þ!!&&!$ÞÞÞ œ %Þ$&$"" "Ñ c. iii. The T -value given in the Excel output is always for a 2-tail test; since we are conducting a 1tail test L+ À "" !, the T -value is !Þ!!!#&& œ !Þ!!!"#(& # d. Ð !Þ!$&$'*(ß !Þ!"#&%$$&Ñ from the output; notice that the interval is entirely negative. e. use sC"**) „ >‡8# WIÐ. s"**) Ñ; for the calculations below, note that from the output we have =/ œ !Þ")''#'%*; sC"**) œ %*Þ*!#"$!%$ !Þ!#$*&'&##Ð"**)Ñ œ #Þ!$( ST 305 Final Exam Practice Problems WIÐ. s"**) Ñ œ WI # Ð," Ñ ‚ ÐB/ BÑ# =#/ 8 WIÐC s"**) Ñ œ WI # Ð," Ñ ‚ ÐB/ BÑ# =#/ 8 œ ÐÞ!!&&!$$"&Ñ# ‚ Ð"**) "*)'Þ&Ñ# page 7 !Þ")''#'%*# #% œ Þ!($)'))*à >‡8# WIÐ. s"**) Ñ œ #Þ!(%Ð Þ!($)'))*Ñ œ Þ"&$#!%à so sC"**) „ >‡8# WIÐ. s"**) Ñ œ #Þ!$( „ Þ"&$#!% Ê ("Þ))$(*', #Þ"*!#!%) f. use sC"**) „ >‡8# WIÐC s"**) Ñ; for the calculations below, note that from the output we have =/ œ !Þ")''#'%*; sC"**) œ %*Þ*!#"$!%$ !Þ!#$*&'&##Ð"**)Ñ œ #Þ!$(à ÐÞ!!&&!$$"&Ñ# ‚ Ð"**) "*)'Þ&Ñ# =#/ œ !Þ")''#'%*# #% !Þ")''#'%*# œ Þ#!!("%à >‡8# WIÐC s"**) Ñ œ #Þ!(%ÐÞ#!!("%Ñ œ Þ%"'#)à so sC"**) „ >‡8# WIÐC s"**) Ñ œ #Þ!$( „ Þ%"'#) Ê "Þ'#!(#ß #Þ%&$#) = 12. a. ," œ < =BC œ Þ*(" "(Þ"* œ Þ***%*à ,! œ C ," B œ '"Þ)" Þ***%*Ð'(Þ%%Ñ œ &Þ&*$&); "'Þ( b. sC*( œ &Þ&*$&) Þ***%*Ð*(Ñ œ *"Þ$' #&&Þ(%) c. =/ œ WWI œ %Þ#(%; 8# œ "% d. WIÐ," Ñ œ / = %Þ#(% œ Þ!''"; 8" =B "&‡"'Þ( confidence interval is ," „ >‡8# WIÐ," Ñ œ Þ***%* „ #Þ"%&ÐÞ!''"Ñ œ Þ***%* „ Þ"%"() Ê ÐÞ)&(("ß "Þ"%"#(Ñ e. L! À "" œ ! vs L+ À "" Á !à ," test statistic > œ WIÐ, œ Þ***%* Þ!''" œ "&Þ"#à "Ñ for α œ Þ!&ß the rejection region is > #Þ"%& and > #Þ"%& (8 # œ "' # œ "% .0 Ñ; the T -value is 0 to nine decimal places. Conclusion: since the test statistic is in the rejection region, reject L! À "" œ ! and conclude that year is useful for predicting the percentage of respondents that will answer “yes” to the question. f. In the model C3 œ "! "" B3 %3 , the error terms %3 are assumed to be independent and normally distributed with mean 0 and standard deviation 5 for all B, that is, %3 µ iid R Ð!ß 5Ñ for all BÞ = Below is the least squares line on the scatterplot, a plot of the residuals against the B-values, and a histogram of the residuals. The first 2 plots cast some doubt on the assumption of independent errors since the residuals appear to occur in groups of positive and negative residuals. The histogram casts some doubt on the normality of the errors since the residuals show some left-skewness. %Yes vs Year 105 y = 0.9994x ‐ 5.5827 R² = 0.9423 95 85 75 65 55 45 35 25 35 45 55 65 75 85 95 105 ST 305 Final Exam Practice Problems Residual Plot 10 Residuals 5 0 ‐5 30 50 70 ‐10 ‐15 Year 90 110 page 8