Stat 301 B -- Fall 2014 -- Final exam 15 December 2014 Instructions: 1. Please put your name on the back of the last page. I don’t want to see your name until I have finished grading. 2. Read each question carefully and completely. Ask if you don’t understand something. 3. Answer each question and show work in the space provided. Scratch paper is provided for your use, but I will only read and evaluate what you put in the answer spaces. 4. Use the JMP output. I am very happy to answer questions along the lines of ‘(You pointing to a number on the JMP output) Is this the confidence interval for the regression slope?’. There are 11o points of questions; you get 15 points for free. Part I: Babies typically start to crawl when around 7 months old, but there is considerable individual variability. The child development literature suggests the age that a baby first crawls depends on the air temperature during development. Babies born in April tend to crawl at about 6½ months, while babies born in September tend to crawl at almost 8 months. One suggested reason is that September babies develop over the winter and so are often wrapped up in more clothes than April babies. The following data were collected to evaluate whether development temperature is associated with the time to crawl. 120 babies were randomly selected from all births at a large metropolitan hospital such that there were 10 babies per birth month (10 in January, 10 in February, … 10 in December). The following variables were collected: temperature : average outdoor air temperature during each baby’s development apgar: the apgar score of the newborn baby. This is a measure of health of the newborn. 10 is fantastic, 9 is typical, and 3 or 4 indicates potential developmental issues. weight: birth weight of the baby, in pounds. length: length of baby at birth, in inches gestation: gestation time (time from conception to birth), in weeks The JMP output packet includes: Model fit statistics for the best 4 models with 1, 2, 3, 4, and 5 variables. Parameter estimates and additional information for selected models, including a model with: Temperature only Temperature, Apgar Temperature, Apgar, Weight Temperature, Apgar, Weight, Gestation Temperature, Apgar, Weight, Length, Gestation Error sums-of-squares for these models Regression diagnostic plots for the model with temperature, apgar, weight, and length. 5 pts. You want to find the most appropriate set of variables to predict time-to-crawl. If you use AICc as your criterion, which variables should be included in your model? 5 pts. A friend suggests that the two variable model using temperature and apgar makes good predictions. Do you agree? Briefly explain why or why not. 5 pts. The five variable model, with temperature, apgar, weight, length, and gestation, has the highest RSquare statistic (in the model fit table) and smallest Sum-of-Squared-Errors (not shown, but it is the smallest). Will this model make the best predictions of new observations? Briefly explain why or why not. 5 pts. You are specifically interested in the association of temperature and time-to-crawl. The other four variables (apgar, weight, length, and gestation) are considered nuisance variables. You want to evaluate the association of temperature and time-to-crawl after adjusting for an appropriate set of nuisance variables. What nuisance variables should be included in your model? Briefly explain your choice. 5 pts. What is the regression slope describing the relationship between temperature and time-to-crawl after adjusting for your appropriate set of nuisance variables? 5 pts. What is the simple linear regression slope describing the relationship between temperature and time-to-crawl? For the questions on this page, consider the model with temperature, apgar, weight, and length. 5 pts. Do you have any concerns about influential observations? Briefly explain why or why not. 5 pts. Do you have any concerns with outlying observations? Briefly explain why or why not. 5 pts. Do you have any concerns about multicollinearity? Briefly explain why or why not. 5 pts. Do you have any concerns with lack-of-fit of the model? Briefly explain why or why not. 5 pts. There are 120 observations in the data set. Calculate how many degrees of freedom are associated with the Mean squared error for the model with temperature, apgar, weight, and length. 5 pts. Interpret, in the context of this data set and model, the slope for temperature. 5 pts. Test the hypothesis that the slope for length = 0, in the model with temperature, apgar, weight, and length. Report the test statistic and p-value. 5 pts. Write a one-sentence conclusion describing the result of the hypothesis test in the previous question. 5 pts. Another model using the same 4 variables is: E Time = 25 – 0.1 Temperature – 2 Apgar + 3.1 Weight + 0.2 Length The model fitted by JMP is better than this model. In what way is the model fitted by JMP better? Part II: Playing a musical instrument is claimed to increase brain development. The following data were collected on a collection of college students. They reported the number of years they had played a musical instrument and whether that instrument was a string instrument or a wind instrument. An MRI scan of the brain produced a measure of neuron activity called the NAI (neuronal activity index). The investigators are interested in whether neuronal activity was increased by playing an instrument and whether there was any difference between playing a wind or a string instrument. They fit the model: πΈ πΏππ ππ΄πΌ = π½0 + π½1 πππππ + π½2 πΌππ π‘ππ’ππππ‘ where Years is the number of years they played and Instrument is an indicator variable with the value 0 for a wind instrument and 1 for a string instrument. Part of the JMP Fit Model output is: 5 pts. Predict the log NAI for someone playing a string instrument for 5 years. 5 pts. Estimate the difference in log NAI between someone playing a string instrument for 5 years and someone playing a wind instrument for 5 years. 5 pts. Test the hypothesis of no difference in log NAI between someone playing a string instrument for 5 years and someone playing a wind instrument for 5 years. Report the p-value for this test. 5 pts. For this population of students, make an inference about the effect of playing an instrument for an additional year. 5 pts. Describe the effect of playing an instrument for a year on NAI. Note: This question asks for the effect on untransformed NAI, not log NAI. The investigators also fit the model that includes the interaction between Instrument and Years πΈ πΏππ ππ΄πΌ = π½0 + π½1 πππππ + π½2 πΌππ π‘ππ’ππππ‘ + π½3 πΌππ π‘ππ’ππππ‘ ∗ πππππ for which the parameter estimates are: 5 pts. Give a careful interpretation of the estimated coefficient for years in the model with the Instrument*Years interaction. 5 pts. The investigators wonder whether the difference in log NAI between string and wind instruments is the same no matter how long they have been played. What can you tell them?