Stat 301: HW 8 answers

advertisement
Stat 301: HW 8 answers
1. Model academic salaries (4 pts, 2 for the problem, 2 for the solution):
Short answer: The proposed model forces the difference between each pair of adjacent academic ranks
to be β1, which is too restrictive. A better model allows each academic rank to have its own mean
salary. One way to do this is to create 3 indicator variables.
Explanation: The proposed model treats the mean salary for each of the four academic ranks as a linear
regression on the ``level’’ (0, 1, 2, 3). According to this model, the mean salary for lecturers is β0, the
mean salary for asst. profs is β0 + β1, the mean salary for assoc. profs is β0 + 2 β1, and the mean salary for
profs is β0 + 3 β1. This forces the difference between each pair of adjacent academic ranks to be β1. The
data may easily be otherwise. E.g., the mean salaries does increase with rank, but the difference
between lecturers and asst profs may not be the same as the difference between asst and assoc profs.
One way to specify a model that allows each rank to have its own mean salary is to create 3 indicator
variables that specify the differences between the groups. The approach in lecture and the book is to
define one group as a reference group (e.g. lecturers) and 3 indicator variables, (e.g. one for asst profs,
one for assoc profs, and one for profs). Each indicator has the value of 1 for each of the specified group.
2. Pace of life – model comparison
a. 2 pts.
Full model:
Reduced model:
E Y = β0 + β1 bank + β2 walk + β3 talk + β4 bank*walk
E Y = β0 +
β2 walk + β3 talk
b. 4 pts. F statistic by comparing two fitted models using an ANOVA table:
Source
df
SS
MS F
Difference
2
125.547 62.77 2.737
Full Error
31
710.849 22.93
Reduced Error
33
836.396
Notes: You fit the Full and Reduced model separately. Look at the JMP Analysis of Variance (not the
Lack of Fit!) and find the Error line. df and SS are the error df and Sums-of-squares for each model.
Difference is the difference within each column. MS is the ratio of SS/ df within each row. F is the ratio
of MS for difference / MS for Full Error.
c. 1 pt. F = 2.737, p = 0.08
Notes: Estimates / Custom Test on results from the Full model.
d. 1 pt. Yes, because the p-value is > 0.05. (Or, No because the p-value is < 0.10)
Notes: This p-value is definitely in a grey area. If much smaller (more significant), the result is clear: at
least one is not zero so you shouldn’t remove them. If much larger (less significant), the result is also
clear, no evidence that they’re not zero, so remove them.
3. Bat echolocation
a. 1 pt. When JMP creates the indicator variables:
the regression coefficients are log mass: 0.815, Type[birds]: 0.042, and Type[e-bats]: 0.018
root MSE: 0.186
Notes: If you constructed 0/1 indicator variables, you would get different estimates for the Type
coefficients but the same log mass slope and root MSE.
b. 1 pt. β1 and β2, the coefficients for the two indicator variables, must both = 0.
c. 1 pt. F = 0.43, p-value = 0.66
d. 1 pt. No evidence of a difference in mean log energy among the three types, when compared at the
same log mass.
e. 1 pt. F = 0.67, p-value = 0.53.
Notes: This is the test of the interaction (type * log mass).
f. 1 pt. The estimated β1 = 0.079, p = 0.70.
g. 1 pt. This was a think about question. Anything reasonable is acceptable.
The statistical reasons I can think of are:
Adding more observations gives a more precise estimate of the error sd
Adding information on another species may give a more precise estimate of the slope for log mass
If you have JMP color observations by the value of Type (plot below), you see that echolocating bats are
all quite small (green dots), while non-echolocating bats are all quite large (blue dots), while birds span
quite a size range.
h. 1 pt. The residual plot looks pretty good to me. The vertical spread is slightly larger at large
predicted values, but this isn’t a very large change, and the data set is small.
Download