Stat 301 A -- Fall 2015 -- Midterm... 3 November 2015 Questions

advertisement
Stat 301 A -- Fall 2015 -- Midterm exam 2
3 November 2015
Questions
Instructions:
1. Please put your name on the back of the last page. I don’t want to see your name until I have
finished grading.
2. Background information for each problem is given in the second packet.
3. Read each question carefully and completely. Ask if you don’t understand something.
4. Answer each question and show work in the space provided. You may use the back of a page as
scratch paper, but I will only read and evaluate what you put in the answer spaces.
5. Use the JMP output wherever you can. I am very happy to answer questions along the lines of
‘(You pointing to a number on the JMP output) Is this the confidence interval for the regression
slope?’.
6. Where relevant, all models are fit without centering polynomials.
There are 100 points of questions.
Problem 1. Intelligence and future salary
y
y
y
y
a) 3 pts. Circle the picture that most appropriately depicts the relationship between income and AFQT
score.
b) 3 pts. What is the value of r2 for the simple linear regression predicting income from the AFQT score?
The test of the null hypothesis that the population correlation is 0 has a p-value of < 0.0001. Use this
information, your answers to questions 1a and 1b, and your knowledge to answer question 1c.
c) 12 pts. Circle T for those statements that are true, circle F for those that are false, and circle I where
you have insufficient information to decide True or False.
T
F
I There is no linear association between income and the AFQT score.
T
F
I The estimated slope in the linear model: E Income = β0 + β1 AFQT is not zero.
T
F
I The model E Income = β0 + β1 AFQT + β2 AFQT2 will fit significantly better than does the
linear model.
T
F
I
Dr. Dixon believes the r2 statistic should be reported more frequently.
T
F
i
The regression: E Income = β0 + β1 AFQT is useless because the r2 statistic is so small
T
F
i
Estimating the population correlation requires more assumptions than does regression
The next six questions concern the model using MotherEd, FatherEd, FamilyIncome78, and AFQT to
predict income (output #2).
d) 5 pts. Carefully interpret the slope for FamilyIncome78 (estimated as 0.056).
e) 5 pts. What is the difference in predicted 2005 income between a youth whose parents both had a
10’th grade education and a youth whose parents both had an 11’th grade education? The two youths
have the same FamilyIncome78 and same AFTQ score. Show your work or report where in the JMP
output you found the answer.
f) 5 pts. The slope for FatherEd in a simple linear regression (details not shown) is reported as 522.
Your colleague thinks this must be a typo (i.e., a typing mistake) because the slope for FatherEd in the
multiple regression is 122. What do you think?
g) 5 pts. Test the hypothesis that all four regression slopes equal 0. Report the p-value for this test and
write a one sentence conclusion.
h) 3 pts. You would like to use the model to make precise predictions of 2005 income. Specifically, you
would like the standard deviation of predicted 2005 income for an individual to be less than $5,000. Is
this possible with this model and data? Briefly explain why or why not.
i) 5 pts. Which of the four predictor variables (MotherEd, FatherEd, FamilyIncome78, or AFQT) is the
most important predictor of 2005 income? Briefly explain your choice. If you need additional
information to answer this question, state what information you need.
The last three questions concern the model with 6 predictor variables (output #3)
j) 5pts. Does the slope for FamilyIncome78 depend on the subject’s AFQT score? Support your answer
with relevant result(s).
k) 5 pts. Carefully interpret the slope for AFQT (the value of 112.7) in this model.
l) 3 pts. Is the relationship between 2005 income and FamilyIncome78 well described by a straight line,
or is a more complicated model necessary? Support your answer with relevant result(s).
Problem 2. Longevity of mammal species
The first two questions concern the regression of log longevity on mass and metabolic rate:
a) 3 pts. Write out the model that predicts the log longevity from mass and metabolic rate? Make sure
to include the values of the regression coefficients.
b) 5 pts. Estimate how the predicted longevity changes when you compare two species that differ by
100kg but have the same metabolic rate. Show your work or state where in the output you got your
answer.
The rest of the questions for this problem concern the regression of log longevity on log mass and log
metabolic rate:
c) 3 pts. Write out the model that predicts the log longevity from log mass and log metabolic rate?
Make sure to include the values of the regression coefficients.
d) 5 pts. Estimate how the predicted longevity changes when you compare two species that have the
same metabolic rate but the mass of the second is 10 times that of the first. Show your work or state
where in the output you got your answer.
e) 3 pts. Is there an issue with the assumption of independence for these data? Briefly explain why or
why not.
f) 3 pts. Is there an issue with the assumption of equal variance for these data? Briefly explain why or
why not.
g) 3 pts. Is there an issue with the assumption that the model fits these data? Briefly explain why or
why not.
h) 3 pts. Is there an issue with multicollinearity with these data? Briefly explain why or why not.
i) 3 pts. Predict the longevity of a species with a mass of 65 kg and a metabolic rate of 7560. Show your
work.
j) 5 pts. Does the prediction in the previous question (at mass of 65 kg and metabolic rate of 7560)
require extrapolating beyond the range of the data? Briefly explain why or why not.
k) 5 pts. Based on what you have learned in previous questions for this problem, do you have any
concerns about using the model to predict longevity for a mammal species with a mass of 65 kg and
metabolic rate of 7560? Briefly explain why or why not.
Download