1.
Dugongs are large aquatic mammals similar to manatees but native to the Indian and Pacific
Oceans. Data was collected on the age (years) and length (meters) of 27 dugongs captured near
Townsville in north Queensland, Australia. The data are given below.
Age Length Age Length Age Length
1 1.80 8 2.47 13 2.47
1.5 1.85 8.5 2.19 14.5 2.56
1.5 1.77 9.5 2.4 15.5 2.47
2.5 2.02 9.5 2.39 16.5 2.64
4 2.27 10 2.41 17 2.56
5 2.15 12 2.50 22.5 2.70
5 2.26 12 2.32 29 2.72
7 2.35 13 2.43 31.5 2.57 a) Plot Length versus Age. Describe the general pattern. b) Fit a simple linear model relating Length to Age. i.
Give the equation of the least squares line. ii.
Interpret both the estimated intercept and the estimated slope within the context of the problem. iii.
Comment on how well the simple linear model fits the data. Be sure to mention the R
2 value, RMSE, model utility, significance of variables in the model, and the plot of residuals versus Age. c) Fit a polynomial regression (degree=2) model with Age and Age
2
as the explanatory variables. Do not center variables . i.
Give the equation of the least squares line. ii.
Why is it difficult to interpret the parameter estimates for this model? iii.
Comment on how well the model fits the data. Be sure to mention the R
2
value, RMSE, model utility, significance of variables in the model, and the plot of residuals versus
Age. d) Fit a polynomial regression (degree=3) with Age, Age
2
and Age
3
as the explanatory variables. Do not center variables . i.
Give the equation of the least squares line. ii.
Comment on how well the model fits the data. Be sure to mention the R
2
value, RMSE, model utility, and the significance of variables in the model. e) For the model with Age, Age
2
and Age
3
as the explanatory variables, look at the distribution of residuals. Comment on the conditions of identically and normally distributed errors. Be sure to refer to the appropriate plots in your comments. f) Which model b), c) or d) does a better job of predicting the lengths of dugongs? To answer this question you should look at the predictions especially for older dugongs. Note: This question is not asking which model is the best statistical model. g) Report the correlations between Age and Age
2
, Age and Age
3
, Age
2
and Age
3
. Is there statistically significant multicollinearity?
(OVER)
1
h) Fit a polynomial regression (degree=2) with Age and (Age – Mean Age)
2
. i.
Give the equation of the least squares line. ii.
Comment on how well the model fits the data. Be sure to mention the R
2
value, RMSE, model utility, significance of variables in the model, and the plot of residuals versus
Age. iii.
How does this model compare to the model in c)? i) Fit a polynomial regression (degree=3) with Age, (Age – Mean Age)
2
Age)
3
as the explanatory variables.
and (Age – Mean i.
Give the equation of the least squares line. ii.
Comment on how well the model fits the data. Be sure to mention the R
2
value, RMSE, model utility, and the significance of variables in the model. iii.
How does this model compare to the model in d)? j) Report the correlations between Age and (Age – Mean Age)
2
, Age and (Age – Mean Age)
3
,
(Age – Mean Age)
2
and (Age – Mean Age)
3
. Is there statistically significant multicollinearity?
2