Stat 301 B -- Fall 2014 -- Midterm exam 2 6 November 2014 Instructions: 1. Please put your name on the back of the last page. I don’t want to see your name until I have finished grading. 2. Read each question carefully and completely. Ask if you don’t understand something. 3. Answer each question and show work in the space provided. Scratch paper is provided for your use, but I will only read and evaluate what you put in the answer spaces. 4. Use the JMP output. I am very happy to answer questions along the lines of ‘(You pointing to a number on the JMP output) Is this the confidence interval for the regression slope?’. There are 90 points of questions; you get 10 points “for free”. Investigators have studied whether the “pace of life” is associated with health. You may be familiar with the stereotypes: Los Angeles is a “laid back” city, with a slower pace of life; New York City is “hurry up” city with a faster pace of life. These investigators studied 36 US cities, which can be considered a random sample of US cities (population larger than 100,000 people). They quantified the pace of life with three measures: 1. bank: the average speed of a specific transaction at a bank 2. walk: the average walking speed of pedestrians in the business district 3. talk: the average number of words per minute spoken by postal clerks The response variable, heart, is the number of heart attacks per 1000 population. All questions on this exam concern various analyses of this data set. The packet of JMP output includes: 1. 2. 3. 4. 5. 6. 7. 8. Correlations among each pair of variables and the scatterplot matrix Summary statistics for each variable (condensed from JMP output) Analyze / Fit Model with X = walk Analyze / Fit Model with X = walk and bank Analyze / Fit Model with X = walk, bank, and talk Analyze / Fit Model with X = walk, bank, and walk*bank Analyze / Fit Model with X = walk, bank and walk2 Residual vs. predicted value plot for the model with X= walk and bank For some questions, but not all, I have indicated parts of the output that might be relevant for a question. 1) 6 pts. JMP output 1 and 3. Five numbers that describe different aspects of the relationship between walk and heart are: the correlation coefficient: 0.348 the regression slope: 0.423 the standard error of the regression slope: 0.196 the p-value for the regression slope: 0.038 the root mean-squared-error for the regression: 4.96 What number most clearly: (Note: No explanations needed; just give me the number that is your answer). a) describes the strength of the linear association between walk and heart? b) predicts the difference in number of heart attacks between a city with a walk value of 15 and a city with a walk value of 16? c) supports your claim that the walking speed helps predict the number of heart attacks? 2) 5 pts. What is the equation that predicts the number of heart attacks per 1000 population from the values of bank, talk, and walk? 3) 5 pts. Another equation that predicts the number of heart attacks per 1000 population from the values of bank, talk, and walk is: heart = 5.2 + 0.5 Walk + 0.5 Bank + 0.5 Talk The prediction equation in your answer to question 2 is better than this equation. Explain in what way your answer to question 2 is better. 4) 5 pts. JMP output 1, 5. The negative coefficient for talk in the model with walk, bank, and talk is somewhat surprising. Is there any concern with multicollinearity in this model? Briefly explain why or why not. 5) 5 pts. Does adding information about the speed of talking (the talk variable) improve predictions of the number of heart attacks, above and beyond what you would predict from bank and walk alone? Briefly explain your answer. 6) 5 pts. JMP output 4. Give a careful interpretation of the estimated coefficient for walk in the model with walk and bank. 7) 5 pts. JMP output 1, 2, and 4. In the model with X=walk and bank, which variable (walk or bank) is more important in predicting the number of heart attacks? Briefly explain your answer. 8) 5 pts. JMP output 4. The output from the model with X=bank and walk includes a p-value of 0.0214 (underlined in the Analysis of Variance block of output). This p-value of 0.0214 is the result of a test of a particular null hypothesis. What is that null hypothesis? 9) 5 pts. JMP output 4 and 8. Is there any concern about lack of fit of the regression model with X=bank and walk? Briefly explain why or why not. 10) 5 pts. JMP output 4 and 8. Is there any concern about the assumption of equal variance when fitting the model with X=bank and walk? Briefly explain why or why not. 11) 5 pts. Consider the model with X=walk and bank. This model proposes that the relationship between walking speed (walk) and the number of heart attacks is described by a straight line. Is this appropriate, or is the relationship to walking speed something more complicated? Briefly explain your answer. 12) 5 pts. Does the slope of the relationship between walking speed (walk) and the number of heart attacks depend on the speed of a bank transaction (bank). Briefly explain why or why not. 13) 5 pts. JMP output 6. Give a careful interpretation of the estimated coefficient for walk in the model with walk, bank and walk*bank. 14) 5 pts. JMP output 6. A friend looks at the results for parameter estimates from the model with walk, bank, and walk*bank and comments that “All variables are non-significant (p > 0.05), so there is no use trying to predict the number of heart attacks from the walking speed or bank speed”. Do you agree or not? Briefly explain your answer. 15) 5 pts. Consider the model with walk, bank, and walk*bank. Calculate the F statistic that tests the null hypothesis that the slope for bank = 0 and slope for walk*bank=0. 16) 5 pts. JMP output 5 and 7. Could you use a model comparison F statistic to compare the 3-variable model with walk, bank, and talk, to the quadratic model with walk and walk2? Briefly explain why or why not. 17) 5 pts. JMP output 1, 2, and 3. Would you have any concerns using the model with X= walk to predict the number of heart attacks in a city with walk = 35? Briefly explain why or why not. 18) 4 pts. JMP output 1, 2, and 3. A friend is interested in the “backwards” relationship, a regression model to predict Y= walking speed from the X = number of heart attacks per 1000 population. What is the slope, 𝛽1 for this regression? Show your work. NOTE: we did not cover the material for Q 18 in 2015.