Stat 301 A -- Fall 2015 -- Midterm exam 2 3 November 2015 Questions Instructions: 1. Please put your name on the back of the last page. I don’t want to see your name until I have finished grading. 2. Background information for each problem is given in the second packet. 3. Read each question carefully and completely. Ask if you don’t understand something. 4. Answer each question and show work in the space provided. You may use the back of a page as scratch paper, but I will only read and evaluate what you put in the answer spaces. 5. Use the JMP output wherever you can. I am very happy to answer questions along the lines of ‘(You pointing to a number on the JMP output) Is this the confidence interval for the regression slope?’. 6. Where relevant, all models are fit without centering polynomials. There are 100 points of questions. Problem 1. Intelligence and future salary y y y y a) 3 pts. Circle the picture that most appropriately depicts the relationship between income and AFQT score. b) 3 pts. What is the value of r2 for the simple linear regression predicting income from the AFQT score? The test of the null hypothesis that the population correlation is 0 has a p-value of < 0.0001. Use this information, your answers to questions 1a and 1b, and your knowledge to answer question 1c. c) 12 pts. Circle T for those statements that are true, circle F for those that are false, and circle I where you have insufficient information to decide True or False. T F I There is no linear association between income and the AFQT score. T F I The estimated slope in the linear model: E Income = β0 + β1 AFQT is not zero. T F I The model E Income = β0 + β1 AFQT + β2 AFQT2 will fit significantly better than does the linear model. T F I Dr. Dixon believes the r2 statistic should be reported more frequently. T F i The regression: E Income = β0 + β1 AFQT is useless because the r2 statistic is so small T F i Estimating the population correlation requires more assumptions than does regression The next six questions concern the model using MotherEd, FatherEd, FamilyIncome78, and AFQT to predict income (output #2). d) 5 pts. Carefully interpret the slope for FamilyIncome78 (estimated as 0.056). e) 5 pts. What is the difference in predicted 2005 income between a youth whose parents both had a 10’th grade education and a youth whose parents both had an 11’th grade education? The two youths have the same FamilyIncome78 and same AFTQ score. Show your work or report where in the JMP output you found the answer. f) 5 pts. The slope for FatherEd in a simple linear regression (details not shown) is reported as 522. Your colleague thinks this must be a typo (i.e., a typing mistake) because the slope for FatherEd in the multiple regression is 122. What do you think? g) 5 pts. Test the hypothesis that all four regression slopes equal 0. Report the p-value for this test and write a one sentence conclusion. h) 3 pts. You would like to use the model to make precise predictions of 2005 income. Specifically, you would like the standard deviation of predicted 2005 income for an individual to be less than $5,000. Is this possible with this model and data? Briefly explain why or why not. i) 5 pts. Which of the four predictor variables (MotherEd, FatherEd, FamilyIncome78, or AFQT) is the most important predictor of 2005 income? Briefly explain your choice. If you need additional information to answer this question, state what information you need. The last three questions concern the model with 6 predictor variables (output #3) j) 5pts. Does the slope for FamilyIncome78 depend on the subject’s AFQT score? Support your answer with relevant result(s). k) 5 pts. Carefully interpret the slope for AFQT (the value of 112.7) in this model. l) 3 pts. Is the relationship between 2005 income and FamilyIncome78 well described by a straight line, or is a more complicated model necessary? Support your answer with relevant result(s). Problem 2. Longevity of mammal species The first two questions concern the regression of log longevity on mass and metabolic rate: a) 3 pts. Write out the model that predicts the log longevity from mass and metabolic rate? Make sure to include the values of the regression coefficients. b) 5 pts. Estimate how the predicted longevity changes when you compare two species that differ by 100kg but have the same metabolic rate. Show your work or state where in the output you got your answer. The rest of the questions for this problem concern the regression of log longevity on log mass and log metabolic rate: c) 3 pts. Write out the model that predicts the log longevity from log mass and log metabolic rate? Make sure to include the values of the regression coefficients. d) 5 pts. Estimate how the predicted longevity changes when you compare two species that have the same metabolic rate but the mass of the second is 10 times that of the first. Show your work or state where in the output you got your answer. e) 3 pts. Is there an issue with the assumption of independence for these data? Briefly explain why or why not. f) 3 pts. Is there an issue with the assumption of equal variance for these data? Briefly explain why or why not. g) 3 pts. Is there an issue with the assumption that the model fits these data? Briefly explain why or why not. h) 3 pts. Is there an issue with multicollinearity with these data? Briefly explain why or why not. i) 3 pts. Predict the longevity of a species with a mass of 65 kg and a metabolic rate of 7560. Show your work. j) 5 pts. Does the prediction in the previous question (at mass of 65 kg and metabolic rate of 7560) require extrapolating beyond the range of the data? Briefly explain why or why not. k) 5 pts. Based on what you have learned in previous questions for this problem, do you have any concerns about using the model to predict longevity for a mammal species with a mass of 65 kg and metabolic rate of 7560? Briefly explain why or why not.