1 Final Exam ST511 Score ________________ Name ________________________ Date__________________________ Because most of the questions are multiple choice, it is important that you round your answers as the LAST step. 1. Old Faithful Geyser in Yellowstone National Park (YNP) derives its name and fame for the regularity of its eruptions. Predicting times of geyser eruptions is one of the unique aspects of a Yellowstone Park Ranger’s job. Predictions are made using statistical methods, and regression analysis in particular. This question uses data from Old Faithful Geyser. The variables in the data set are eruption duration (x, in minutes) and the time interval to the start of the next eruption (y, in minutes). (For example, the point ( xi , yi ) = (3, 58) means that ith eruption had a duration of 3 minutes, and then there was a time interval of 58 minutes to the start of the next eruption.) Theory suggests that the longer the duration of an eruption, the more water and steam that is released, and thus it takes a longer time for pressure to build up for a subsequent eruption; suggesting the hypothesis that long (short) durations of eruptions tend to be followed by long (short) intervals to the next eruption. A scatter plot and the results of a regression analysis of eruption duration on the time interval to subsequent eruptions are given below. y 71.00901 , s y2 163.81892 x 3.576126 , s x2 1.1749479 Parameter Estimate Intercept Slope SE Coef T-Stat P-Value 33.96676 1.4278698 23.788412 <0.0001 10.358206 0.3821834 27.102709 <0.0001 Analysis of variance table for regression model: Source Regression DF SS MS F-stat P-value 1 27859.926 27859.926 734.5569 <0.0001 Error 220 8344.056 Total 221 36203.98 37.92753 2 1. What is the estimated regression line of interval on duration? a) y 0 1 x b) y 1.42786988 0.3821834 x c) y 1.42786988 0.3821834 x d) y 33.96676 10.358206 x e) y 0 1 x 2. What hypothesis test should be conducted in order to determine if there is a linear relationship between durations and waiting intervals? a) H 0 : 0 0 vs H A : 0 0 b) H 0 : 1 1 vs H A : 1 1 c) H 0 : 1 0 vs H A : 1 0 d) H 0 : 1 0 vs H A : 1 0 e) H 0 : 1 0 vs H A : 1 0 3. What hypothesis test should be conducted in order to determine whether the data support the theory that longer (shorter) durations tend to be followed by longer (shorter) waiting intervals? a) H 0 : 1 1 vs H A : 1 1 b) H 0 : 1 0 vs H A : 1 0 c) H 0 : 0 12 vs H A : 0 12 d) H 0 : 1 0 vs H A : 1 0 e) H 0 : 0 0 vs H A : 0 0 4. What percentage of the variation in waiting time interval times is explained by the preceding duration times? a) (77 2 )% b) ( 77 )% c) (232 )% d) 77% e) 23% 5. What is the correlation between interval and duration? a) -0.77 b) 0.77 c) 0.88 d) 88 e) 0.23 f) None of the above 3 6. To test H 0 : 1 0 vs. H A : 1 0 , which test statistic should be used: i) t calc 23.788412 a) b) c) d) e) ii) t calc 27.102709 iii) Fcalc 734.5569 i only ii only iii only ii and iii i, ii, and iii 7. To test H 0 : 1 0 vs. H A : 1 0 , which test statistic should be used: i) t calc 23.788412 a) b) c) d) e) 8. ii) t calc 27.102709 iii) Fcalc 734.5569 i only ii only iii only i and iii ii and iii Which of the following is a 95% confidence interval for the slope of the linear regression of interval on duration: a) ˆ1 t 220,0.025ˆ ˆ 1 b) ˆ1 t 220,0.05ˆ ˆ 1 c) ˆ1 t 220,0.025 (37.92753) ( 37.92753 ) d) ˆ t 1 220, 0.025 e) [33.96676 – 1.96(1.4278698), 33.96676 + 1.96(1.4278698)] 9. An eruption ends after a duration of 4 minutes. What is the predicted waiting time until the start of the next eruption? Round to the hundredths place for your final answer. Answer: ______________ 4 10. Consider the problem of testing H 0 : 0.6 versus H A : 0.6 using binomial data, Y ~ Bin (n 1273, ) . A scientist reported the test statistic z calc 1.73 . What is the p-value associated with this test statistic. a) 0.9582 b) 0.9163 c) 0.0418 d) 0.0836 e) 0.0500 11. If we have a random variable X from a normal distribution with mean and variance 2 , then we know the following about the sampling distribution of x : s2 a) x is normally distributed with mean and variance n s b) x is normally distributed with mean and variance n s c) x has mean and variance but we don’t know if x has a normal distribution or not n d) x is normally distributed with mean and variance e) x has mean and variance 2 n 2 but we don’t know if x has a normal distribution or not n 12. Suppose a die is tossed 5 times. What is the probability of the die coming up 1 exactly 2 times? a) b) c) d) e) 0.028 0.161 0.167 0.333 There is not enough information to answer this question. 5 13. Nine hundred (900) high school freshmen were randomly selected for a national survey. Among survey participants, the mean grade-point average (GPA) was 2.7. The population standard deviation is known to be 0.4. What is the margin of error, assuming a 95% confidence level? a) 0.013 b) 0.026 c) 0.500 d) 1.960 e) None of the above 14. A national achievement test is administered annually to 3rd graders and is known to be normally distributed. The test has a mean score of 100 and a variance of 2 225 . If Jane's z-score is 1.20, what was her score on the test? a) b) c) d) e) f) 82 88 100 112 118 None of the above 15. The stemplot below shows the number of hot dogs eaten by contestants in a recent hot dog eating contest. Which of the following statement s are true? I. The sample range is 70. II. The sample median is 46. III. The sample mean is 47. a) b) c) d) e) I only II only III only I and II I, II, and III 6 16. The probability distribution of the random variable pest type, X, is as follows: Pest type x Probability P(X=x) 1 2 3 4 0.25 0.50 0.15 0.10 What is the standard deviation, , of the probability distribution? a) b) c) d) e) f) 0.50 0.62 0.79 0.89 2.10 None of the above 17. A public opinion poll surveyed a simple random sample of voters. Respondents were classified by gender (male or female) and by voting preference (Republican, Democrat, or Independent). Results are shown below. Voting Preferences Republican Democrat Independent Row total Male 200 150 50 400 Female 250 300 50 600 Column total 450 450 100 1000 If you conduct a chi-square test of independence, what is the expected frequency count of male Independents? a) b) c) d) e) 40 50 60 180 270 7 18. Suppose X and Y are independent random variables. The variance ( X2 ) of X is equal to 16; and the variance ( Y2 ) of Y is equal to 9. Let Z = X - Y. What is the standard deviation ( Z ) of Z? a) b) c) d) e) 2.65 5.00 7.00 25.0 It is not possible to answer this question, based on the information given. 19. Twelve identical samples of tissue were divided randomly into three groups of four specimens each. One of the three levels of glucose concentration are randomly assigned to each of the three groups, and each specimen in a group was treated with the assigned level of glucose. After a period of time, the insulin released by the tissue samples was recorded. These appear in the table below. Concentration Low 1.49 1.75 3.50 1.87 Medium 3.33 4.21 3.39 2.87 Source Sum of Squares Between samples i) ???? Within samples Total 3.60587 8.51823 a) Explain what the ANOVA F-test is testing H0: HA: High 1.88 2.02 1.87 2.39 Degrees of Freedom 2 Mean Square F test iii) ???? 6.1304 9 iv) ???? ii)???? 8 b) Complete the ANOVA table below SHOWING WORK to support your answers. Source Sum of Squares Between samples i) ???? Within samples Total i) ii) iii) iv) 3.60587 8.51823 Degrees of Freedom 2 Mean Square F test iii) ???? 6.1304 9 iv) ???? ii)???? 9 20. Suppose we have a dependent variable y and we believe the independent variables x1 and x2 can be used to predict y using a multiple linear regression (MLR) model. After fitting a MLR to the data, the following output was obtained: Analysis of Variance Source Regression Error Total df 2 22 24 SS 181369.07387 427625.96613 608995.04000 MS 90684.53693 19437.54392 F 4.665 P 0.0205 Parameter Estimates Predictor Constant x1 x2 Coef 426.033719 -6.341394 3.375532 SE Coef 252.81248037 2.61207977 1.92991857 T 1.685 -2.428 1.749 P 0.1061 0.0238 0.0942 a) Write the estimated model: b) Based on the t test-statistic and the associated p-value associated with x2 what hypotheses are being tested? H0: HA: c) In non-statitsical terms, based on the p-value associated with x2, what can we conclude about x2? Use 0.05 . d) Can the p-value 0.0238 be used to test H 0 : 1 0 versus H A : 1 0 ? Explain why or why not.