Questions

advertisement
Stat 301 B -- Fall 2014 -- Final exam
15 December 2014
Instructions:
1. Please put your name on the back of the last page. I don’t want to see your name until I have
finished grading.
2. Read each question carefully and completely. Ask if you don’t understand something.
3. Answer each question and show work in the space provided. Scratch paper is provided for your
use, but I will only read and evaluate what you put in the answer spaces.
4. Use the JMP output. I am very happy to answer questions along the lines of ‘(You pointing to a
number on the JMP output) Is this the confidence interval for the regression slope?’.
There are 11o points of questions; you get 15 points for free.
Part I: Babies typically start to crawl when around 7 months old, but there is considerable individual
variability. The child development literature suggests the age that a baby first crawls depends on the air
temperature during development. Babies born in April tend to crawl at about 6½ months, while babies
born in September tend to crawl at almost 8 months. One suggested reason is that September babies
develop over the winter and so are often wrapped up in more clothes than April babies. The following
data were collected to evaluate whether development temperature is associated with the time to crawl.
120 babies were randomly selected from all births at a large metropolitan hospital such that there were
10 babies per birth month (10 in January, 10 in February, … 10 in December). The following variables
were collected:
temperature : average outdoor air temperature during each baby’s development
apgar: the apgar score of the newborn baby. This is a measure of health of the newborn. 10 is
fantastic, 9 is typical, and 3 or 4 indicates potential developmental issues.
weight: birth weight of the baby, in pounds.
length: length of baby at birth, in inches
gestation: gestation time (time from conception to birth), in weeks
The JMP output packet includes:
Model fit statistics for the best 4 models with 1, 2, 3, 4, and 5 variables.
Parameter estimates and additional information for selected models, including a model with:
Temperature only
Temperature, Apgar
Temperature, Apgar, Weight
Temperature, Apgar, Weight, Gestation
Temperature, Apgar, Weight, Length, Gestation
Error sums-of-squares for these models
Regression diagnostic plots for the model with temperature, apgar, weight, and length.
5 pts. You want to find the most appropriate set of variables to predict time-to-crawl. If you use AICc as
your criterion, which variables should be included in your model?
5 pts. A friend suggests that the two variable model using temperature and apgar makes good
predictions. Do you agree? Briefly explain why or why not.
5 pts. The five variable model, with temperature, apgar, weight, length, and gestation, has the highest
RSquare statistic (in the model fit table) and smallest Sum-of-Squared-Errors (not shown, but it is
the smallest). Will this model make the best predictions of new observations? Briefly explain why
or why not.
5 pts. You are specifically interested in the association of temperature and time-to-crawl. The other
four variables (apgar, weight, length, and gestation) are considered nuisance variables. You want to
evaluate the association of temperature and time-to-crawl after adjusting for an appropriate set of
nuisance variables. What nuisance variables should be included in your model? Briefly explain your
choice.
5 pts. What is the regression slope describing the relationship between temperature and time-to-crawl
after adjusting for your appropriate set of nuisance variables?
5 pts. What is the simple linear regression slope describing the relationship between temperature and
time-to-crawl?
For the questions on this page, consider the model with temperature, apgar, weight, and length.
5 pts. Do you have any concerns about influential observations? Briefly explain why or why not.
5 pts. Do you have any concerns with outlying observations? Briefly explain why or why not.
5 pts. Do you have any concerns about multicollinearity? Briefly explain why or why not.
5 pts. Do you have any concerns with lack-of-fit of the model? Briefly explain why or why not.
5 pts. There are 120 observations in the data set. Calculate how many degrees of freedom are
associated with the Mean squared error for the model with temperature, apgar, weight, and length.
5 pts. Interpret, in the context of this data set and model, the slope for temperature.
5 pts. Test the hypothesis that the slope for length = 0, in the model with temperature, apgar, weight,
and length. Report the test statistic and p-value.
5 pts. Write a one-sentence conclusion describing the result of the hypothesis test in the previous
question.
5 pts. Another model using the same 4 variables is:
E Time = 25 – 0.1 Temperature – 2 Apgar + 3.1 Weight + 0.2 Length
The model fitted by JMP is better than this model. In what way is the model fitted by JMP better?
Part II: Playing a musical instrument is claimed to increase brain development. The following data were
collected on a collection of college students. They reported the number of years they had played a
musical instrument and whether that instrument was a string instrument or a wind instrument. An MRI
scan of the brain produced a measure of neuron activity called the NAI (neuronal activity index). The
investigators are interested in whether neuronal activity was increased by playing an instrument and
whether there was any difference between playing a wind or a string instrument. They fit the model:
𝐸 πΏπ‘œπ‘” 𝑁𝐴𝐼 = 𝛽0 + 𝛽1 π‘Œπ‘’π‘Žπ‘Ÿπ‘  + 𝛽2 πΌπ‘›π‘ π‘‘π‘Ÿπ‘’π‘šπ‘’π‘›π‘‘
where Years is the number of years they played and Instrument is an indicator variable with the value 0
for a wind instrument and 1 for a string instrument.
Part of the JMP Fit Model output is:
5 pts. Predict the log NAI for someone playing a string instrument for 5 years.
5 pts. Estimate the difference in log NAI between someone playing a string instrument for 5 years and
someone playing a wind instrument for 5 years.
5 pts. Test the hypothesis of no difference in log NAI between someone playing a string instrument for
5 years and someone playing a wind instrument for 5 years. Report the p-value for this test.
5 pts. For this population of students, make an inference about the effect of playing an instrument for
an additional year.
5 pts. Describe the effect of playing an instrument for a year on NAI. Note: This question asks for the
effect on untransformed NAI, not log NAI.
The investigators also fit the model that includes the interaction between Instrument and Years
𝐸 πΏπ‘œπ‘” 𝑁𝐴𝐼 = 𝛽0 + 𝛽1 π‘Œπ‘’π‘Žπ‘Ÿπ‘  + 𝛽2 πΌπ‘›π‘ π‘‘π‘Ÿπ‘’π‘šπ‘’π‘›π‘‘ + 𝛽3 πΌπ‘›π‘ π‘‘π‘Ÿπ‘’π‘šπ‘’π‘›π‘‘ ∗ π‘Œπ‘’π‘Žπ‘Ÿπ‘ 
for which the parameter estimates are:
5 pts. Give a careful interpretation of the estimated coefficient for years in the model with the
Instrument*Years interaction.
5 pts. The investigators wonder whether the difference in log NAI between string and wind instruments
is the same no matter how long they have been played. What can you tell them?
Download