STAT 452/652 EXAM 2 – DUE Wednesday, November 28 at class time In all problems, you need to follow steps if any are given, and answer all questions asked. Please be very concise and precise in what you write. Please do not include the same information twicebe gentle for the trees! The total number of pages that will be graded will not exceed 6 pages. Do all tests of significance on the significance level of 0.05. Each part of every problem is worth 2 points. RETURN THE EXAM PAGE WITH YOUR ANSWERS. GOOD LUCK! Problem 1. You will work with data in MINITAB project: exam2_data.MPJ on Gauss classdata drive in the folder labeled math462_652. The data contains information on y= energy content of waste (in kcal.kg), and three compositon variables for waste: Plastics=% plastics by weight, Paper=%paper by weight, Garbage=%garbage by weight, and Water=% water content per weight. We will look for the best MLR model for energy as a linear function of the explanatory variables: plastic, paper, garbage and water. 1. There is an influential observation in this data set. Which one is it? Explain why do you think this observation is influential. 2. Is there multicollinearity in the data set? If yes, explain why you think so and which variables seem to be problematic. If no, explain why you think so. 3. Remove the influential observation. 4. Run Forward selection and Backward elimination procedures on this data (with removed influential obs) with no forcing of variables in/out of the regression equation. Do you get the same “best” models? Why? 5. If necessary, reduce the data further by removing variable(s) that might be collinear with other variable(s). Be careful with removing too many variables at a time, I would suggest to start with one, see if that improved the model. If not, try another etc. Write what you did, be very concise. Write the final set of variables you decided to keep in the reduced set. 6. Find the best model for the reduced (if you reduced it) or original data set with influential obs removed (if you did not find multicollinearity present). Use any method you like. Report the method you used and the results. 7. Explain why the model you decided is best is good from (a) practical i.e. prediction/fit and from (b) statistical i.e. inference point of views. Problem 2. Consider a set of 6 values of a variable x: -0.1, -0.3, -0.5 , 5.0 , 10.0 , 7.0 Find corresponding y’s so that the Pearson correlation between x’s and y’s is not significantly different from zero, but Spearman correlation coefficient between x’s and (the same) y’s is significantly different from zero. As your solution provide the method by which you found y’s, print the actual values of y’s you used and provide the results of testing relevant hypothesis about the two correlation coefficients. Remember, you have to prove that your y’s work as requested using tests of hypothesis. Hint: Try for Spearman correlation equal to 1. Problem 3. A trucking company considered multiple regression model for relating the total daily travel time y (hours) for one of its drivers to the distance traveled x1 (miles) and number of deliveries made x2. Suppose that the true regression equation is Y=-0.80+0.06x1+0.90x2+ ε, where ε ~ N(0, σ2=.25). a) Interpret β1=0.06 in the context of the problem. b) What is the probability that travel time will be at most 6 hours when 4 deliveries are made and the distance traveled is 50 miles? Problem 5. These questions are True/False. Circle the correct answer. a. True or False. To conduct analysis/inference in Multiple Linear Regression, we must assume that the explanatory variables are normally distributed. b. True or False. A study was conducted to investigate how the suicide rate (Y) varies with age (X) for white males. The fitted linear regression model based on a sample data collected is given by Y=-13.9 + 0.93X. Since the estimate of β1 is b1=0.93>0, we can conclude that for white males there is a positive linear association between suicide rate and age. c. True or False In constructing an (1-α)100% confidence interval for the mean response in x , the simple linear regression given explanatory variables variable x=x*, the closer x* narrower the confidence interval. d. In studying the relationship between the weights and ages of high school girls, a 95% confidence interval for the mean weight (in pounds) of 12 year old girls was computed as (102.3, 106.5). True or False. This interval is interpreted as: The true mean weight of 12-year old girls is between 102.3 and 106.5 pounds with probability 0.95. Problem 4. UNDERGRADUATE STUDENTS ONLY. You will work with data in MINITAB project exam2_data2.MPJ data on Pythagoras classdata drive in the folder labeled math462_652. Find all three (Pearson, Kendall and Spearman) correlation coefficients between y and x and report results of tests about their significance. As usual, you must state the hypotheses you test, report the value of the test stat, the p-value and the conclusion. Work on each correlation coefficient is worth 2 points. Problem 4. GRADUATE STUDENTS ONLY. Generate values for a response y and three explanatory variables (x1, x2, and x3) so that all of the following conditions are satisfied: a) There are 10 observations; b) x1 is not a significant predictor for y; c) x2 and x3 are significant predictors for y; d) 10th observation is an influential one. 1. Describe how you constructed your data set. Explain how did you get each variable. Please be very concise and precise. Do not print the data set. 2. Show that x1 is not a significant predictor for y. That is perform a partial F (or t) test for the appropriate hypothesis. Show results of MINITAB work for this question. 3. Show that x2 and x3 are significant predictors for y. That is perform an appropriate F test. Show results of MINITAB work for this question. 4. Show that 10th observation is influential. That is compute appropriate statistics, show their values and explain why they mean that the observation is indeed influential. If you can not generate a data set with all of the above properties, generate one with 10 obs and at least one of properties b) – d). This will give you partial credit and a data set to work on for some of the questions 1-4.